Active Parameters
30B
Context Length
131.072K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
29 Apr 2025
Knowledge Cutoff
Mar 2025
Total Expert Parameters
3.0B
Number of Experts
128
Active Experts
8
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
4096
Number of Layers
60
Attention Heads
96
Key-Value Heads
8
Activation Function
SwigLU
Normalization
Layer Normalization
Position Embedding
ROPE
The Qwen3-30B-A3B model is a Mixture-of-Experts (MoE) language model developed by Alibaba, engineered to deliver high-performance inference with reduced computational costs. It features a total of 30.5 billion parameters, but employs a sparse activation strategy where only approximately 3.3 billion parameters are engaged per token. This design allows the model to maintain the broad knowledge and capabilities of a larger system while operating with the latency and resource profile of a significantly smaller dense architecture. It serves as a middle-tier solution within the Qwen3 family, balancing sophistication with operational efficiency.
Technically, the model is structured with 48 transformer layers and utilizes Grouped Query Attention (GQA) with 32 query heads and 4 key-value heads to optimize memory bandwidth and inference speed. The MoE component consists of 128 experts, with 8 experts selected via a routing mechanism for each token. A notable architectural innovation is the hybrid system that supports both a reasoning-heavy thinking mode for complex mathematical and logic tasks and a non-thinking mode for streamlined, general-purpose conversation. This flexibility is supported by training on a massive 36 trillion token corpus spanning 119 languages, incorporating advanced techniques such as Rotary Position Embedding (RoPE) and SwiGLU activation.
Designed for versatile deployment, Qwen3-30B-A3B excels in instruction following, code generation, and complex agentic workflows where it can integrate with external tools. The model supports a native context window of 32,768 tokens, which can be extended to 131,072 tokens using the YaRN (Yet another RoPE N) scaling method, and further iterations have pushed these limits to 256,000 tokens. Its robust multilingual foundation and optimized expert routing make it suitable for various downstream applications ranging from technical reasoning to creative content generation in professional environments.
The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.
Rank
#94
| Benchmark | Score | Rank |
|---|---|---|
General Knowledge MMLU | 0.88 | 7 |
Web Development WebDev Arena | 1383 | 23 |
Mathematics LiveBench Mathematics | 0.65 | 33 |
Data Analysis LiveBench Data Analysis | 0.67 | 34 |
Reasoning LiveBench Reasoning | 0.37 | 38 |
Agentic Coding LiveBench Agentic | 0.02 | 42 |
Coding LiveBench Coding | 0.49 | 43 |
Overall Rank
#94
Coding Rank
#97
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens