ApX 标志ApX 标志

趋近智

Qwen3 Coder 480B A35B

活跃参数

480B

上下文长度

262.144K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

22 Jul 2025

训练数据截止日期

Dec 2024

技术规格

专家参数总数

35.0B

专家数量

160

活跃专家

8

注意力结构

Multi-Head Attention

隐藏维度大小

6144

层数

62

注意力头

96

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

Qwen3 Coder 480B A35B

Qwen3 Coder 480B A35B is Alibaba's advanced agentic artificial intelligence model, specifically engineered for high-performance software development and autonomous coding workflows. As a specialized variant of the Qwen 3 family, it is designed to manage complex multi-turn programming tasks, including comprehensive repository analysis, cross-file reasoning, and automated pull request generation. The model serves as the primary engine for autonomous software engineering, enabling deep integration with developer tools and terminal-based agents like Qwen Code.

Architecturally, the model utilizes a sparse Mixture-of-Experts (MoE) decoder-only transformer framework. It comprises a total of 480 billion parameters, while maintaining computational efficiency by activating only 35 billion parameters per inference query. This configuration employs 160 total experts, with 8 active experts selected via a gating mechanism for each token. The underlying structure features 62 transformer layers and incorporates Grouped Query Attention (GQA) with 96 query heads and 8 key-value heads to optimize memory bandwidth and inference speed. It utilizes Rotary Position Embeddings (RoPE) and is optimized for long-horizon context through techniques such as YaRN, supporting a native context window of 262,144 tokens that can be extended up to one million.

The model is trained on a massive dataset of 7.5 trillion tokens, with a 70% concentration on source code and technical content across multiple programming languages including Python, JavaScript, C++, and Rust. Its post-training phase leverages long-horizon reinforcement learning, specifically Agent RL and Code RL, to improve multi-step planning and interaction with external tools such as browsers and CLI environments. This specialization allows the model to function as a sophisticated coding agent capable of executing complex engineering tasks and managing entire codebases with high precision.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


其他 Qwen 3 模型

评估基准

排名

#26

基准分数排名

Web Development

WebDev Arena

1386

22

排名

排名

#26

编程排名

#31

模型透明度

总分

B

68 / 100

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
128k
256k

所需显存:

推荐 GPU

Qwen3 Coder 480B A35B:规格和 GPU 显存要求