趋近智
活跃参数
30B
上下文长度
131.072K
模态
Text
架构
Mixture of Experts (MoE)
许可证
Apache 2.0
发布日期
29 Apr 2025
知识截止
Mar 2025
专家参数总数
3.0B
专家数量
128
活跃专家
8
注意力结构
Grouped-Query Attention
隐藏维度大小
-
层数
60
注意力头
96
键值头
8
激活函数
-
归一化
Layer Normalization
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
The Qwen3-30B-A3B model, developed by Alibaba, is a Mixture-of-Experts (MoE) language model within the Qwen3 series. Its architecture is optimized for efficient inference across a range of natural language processing tasks. The model totals 30.5 billion parameters, with an active set of approximately 3.3 billion parameters engaged per token during inference, a design choice aimed at achieving performance comparable to larger dense models while significantly reducing computational overhead.
Architecturally, Qwen3-30B-A3B is structured with 48 layers and employs a Grouped Query Attention (GQA) mechanism, featuring 32 query heads and 4 key/value heads. The MoE configuration includes 128 experts, with 8 experts activated per token, and does not incorporate shared experts. A notable feature is its hybrid reasoning system, allowing for seamless transitions between a 'thinking mode' for complex logical reasoning, mathematics, and coding tasks, and a 'non-thinking mode' for general-purpose dialogue. This design enables the model to adapt its computational strategy to the demands of the task, ensuring optimal resource utilization. The model is built upon a pre-training corpus of 36 trillion tokens, encompassing 119 languages, thereby expanding its multilingual proficiency.
Qwen3-30B-A3B is engineered to process text inputs and is designed to enhance reasoning, instruction-following, and agent capabilities. Its native context window supports up to 32,768 tokens, which can be extended to 131,072 tokens through the application of the YaRN (Yet another RoPE N) method for handling longer sequences. The model leverages Rotary Position Embedding (RoPE) and integrates refinements such as global-batch load balancing loss for MoE models and qk layer normalization, which contribute to improved training stability and performance. It is designed to be fine-tunable for specific use cases.
The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.
排名适用于本地LLM。
排名
#10
| 基准 | 分数 | 排名 |
|---|---|---|
Data Analysis LiveBench Data Analysis | 0.67 | 5 |
Mathematics LiveBench Mathematics | 0.77 | 7 |
Graduate-Level QA GPQA | 0.66 | 7 |
Reasoning LiveBench Reasoning | 0.71 | 8 |
Agentic Coding LiveBench Agentic | 0.12 | 9 |
General Knowledge MMLU | 0.66 | 14 |
Coding LiveBench Coding | 0.47 | 19 |