Parameters
32B
Context Length
131.072K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
29 Apr 2025
Knowledge Cutoff
Aug 2024
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
-
Number of Layers
60
Attention Heads
96
Key-Value Heads
8
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Qwen3-32B is a dense large language model developed by Alibaba, part of the comprehensive Qwen3 series. This model is specifically engineered to address a broad range of natural language processing tasks, operating as a causal language model optimized for text generation. A core innovation in its design is the dual-mode reasoning approach, which enables dynamic switching between a "thinking mode" and a "non-thinking mode." The thinking mode is engaged for complex computational tasks, including logical reasoning, mathematical problem-solving, and code generation, allowing for a structured, step-by-step approach to problem-solving. Conversely, the non-thinking mode is utilized for efficient, general-purpose dialogue, prioritizing responsiveness in routine interactions. This adaptable architecture allows Qwen3-32B to optimize its computational strategy based on the input's complexity, balancing accuracy for demanding tasks with efficiency for everyday use.
Architecturally, Qwen3-32B is built upon a dense transformer framework, incorporating 32.8 billion parameters and 64 layers. It employs Grouped Query Attention (GQA) with 64 query heads and 8 key-value heads, a configuration designed to enhance inference efficiency while sustaining high performance. The model incorporates Rotary Positional Embeddings (RoPE) to effectively manage sequence positions and utilizes SwiGLU as its activation function. Normalization within the model is performed using RMSNorm, specifically with a pre-normalization scheme, which contributes to stable training and performance.
Qwen3-32B supports a broad spectrum of languages, encompassing over 100 languages and dialects, extending its applicability across diverse global communication contexts. The model is suitable for applications that require robust reasoning capabilities, adherence to instructions, and agentic functions, demonstrating proficiency in tool integration for agent-based tasks. Qwen3-32B natively supports a context length of 32,768 tokens, which can be extended to 131,072 tokens through the application of YaRN (Yet another RoPE N) scaling techniques, facilitating the processing of long-form content.
The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.
Ranking is for Local LLMs.
Rank
#8
Benchmark | Score | Rank |
---|---|---|
Reasoning LiveBench Reasoning | 0.83 | 🥉 3 |
Data Analysis LiveBench Data Analysis | 0.68 | 🥉 3 |
Mathematics LiveBench Mathematics | 0.80 | ⭐ 4 |
StackUnseen ProLLM Stack Unseen | 0.46 | 5 |
Coding LiveBench Coding | 0.64 | 7 |
Graduate-Level QA GPQA | 0.65 | 8 |
Agentic Coding LiveBench Agentic | 0.10 | 10 |
Overall Rank
#8
Coding Rank
#17
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens