ApX 标志

趋近智

Qwen3-30B-A3B

活跃参数

30B

上下文长度

131.072K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

29 Apr 2025

训练数据截止日期

Mar 2025

技术规格

专家参数总数

3.0B

专家数量

128

活跃专家

8

注意力结构

Grouped-Query Attention

隐藏维度大小

-

层数

60

注意力头

96

键值头

8

激活函数

SwigLU

归一化

Layer Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen3-30B-A3B

The Qwen3-30B-A3B model, developed by Alibaba, is a Mixture-of-Experts (MoE) language model within the Qwen3 series, designed for efficient inference across a range of natural language processing tasks. It encompasses 30.5 billion parameters in total, with an active subset of approximately 3.3 billion parameters engaged per token during inference. This architectural strategy aims to achieve performance levels comparable to larger dense models while significantly reducing the computational overhead required for each processing step. This model is part of a dual architecture strategy by Qwen 3, which includes both dense and sparse (MoE) designs, providing flexibility for various computational resources and use-case complexities.

Architecturally, Qwen3-30B-A3B is structured with 48 layers and employs a Grouped Query Attention (GQA) mechanism, featuring 32 query heads and 4 key/value heads. The MoE configuration integrates 128 experts, with 8 experts activated per token, and does not incorporate shared experts. A distinctive attribute is its hybrid reasoning system, which enables dynamic transitions between a 'thinking mode' for complex logical reasoning, mathematics, and coding tasks, and a 'non-thinking mode' for general-purpose dialogue. This design allows the model to adapt its computational approach based on task requirements, thereby optimizing resource utilization. The model's foundation rests on a pre-training corpus of 36 trillion tokens, covering 119 languages, which contributes to its extensive multilingual proficiency.

Qwen3-30B-A3B is engineered to process text inputs and is designed to enhance reasoning, instruction-following, and agent capabilities. Its native context window supports up to 32,768 tokens, which can be extended to 131,072 tokens through the application of the YaRN (Yet another RoPE N) method for handling longer sequences. The model utilizes Rotary Position Embedding (RoPE) and incorporates architectural refinements such as global-batch load balancing loss for MoE models and qk layer normalization. These refinements contribute to improved training stability and overall performance. The model is also designed to be fine-tunable for specific downstream applications.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


其他 Qwen 3 模型

评估基准

排名适用于本地LLM。

排名

#16

基准分数排名

0.67

7

Graduate-Level QA

GPQA

0.66

7

0.80

9

0.46

14

General Knowledge

MMLU

0.66

14

Agentic Coding

LiveBench Agentic

0.02

18

0.49

20

排名

排名

#16

编程排名

#28

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU

Qwen3-30B-A3B: Specifications and GPU VRAM Requirements