Parameters
32B
Context Length
131.072K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
29 Apr 2025
Knowledge Cutoff
Aug 2024
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
5120
Number of Layers
60
Attention Heads
96
Key-Value Heads
8
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
ROPE
Qwen3-32B is a dense large language model developed by Alibaba and is the premier dense variant within the Qwen3 series. Designed as a unified framework for both general-purpose interaction and complex problem-solving, the model introduces a hybrid reasoning mechanism. This architecture allows for a seamless transition between a 'thinking mode', characterized by generative chain-of-thought processing for mathematical and logical tasks, and a 'non-thinking mode' optimized for high-throughput, responsive dialogue. This dual-mode capability is implemented via a flexible switching system, enabling users to adapt the model's computational depth to the specific requirements of a given query.
Technically, the model is constructed on a 64-layer transformer architecture with 32.8 billion parameters. It utilizes Grouped Query Attention (GQA) with 64 query heads and 8 key-value heads to achieve an optimal balance between inference speed and representational capacity. The integration of QK-Norm and the removal of QKV-bias in this iteration contribute to enhanced training stability. For sequence modeling, the architecture employs Rotary Positional Embeddings (RoPE) with a base frequency of 1,000,000, supporting a native context length of 32,768 tokens that can be extended to 131,072 tokens using YaRN scaling. The model's internal activation uses the SwiGLU function, and normalization is handled through a pre-RMSNorm configuration.
Qwen3-32B is engineered for diverse operational environments, supporting over 100 languages and dialects. Its training pipeline follows a four-stage process including long chain-of-thought cold starts and reasoning-based reinforcement learning, which prepares the model for sophisticated agentic tasks and tool integration. The model is particularly effective in scenarios requiring multi-turn dialogue, complex instruction following, and autonomous tool use, providing a versatile foundation for developers building integrated AI systems across various global contexts.
The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.
Rank
#75
| Benchmark | Score | Rank |
|---|---|---|
Coding Aider Coding | 0.40 | 7 |
Reasoning LiveBench Reasoning | 0.48 | 26 |
Data Analysis LiveBench Data Analysis | 0.68 | 27 |
Web Development WebDev Arena | 1347 | 29 |
Mathematics LiveBench Mathematics | 0.67 | 31 |
Coding LiveBench Coding | 0.66 | 36 |
Agentic Coding LiveBench Agentic | 0.03 | 41 |
Overall Rank
#75
Coding Rank
#65
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens