趋近智
活跃参数
480B
上下文长度
262.144K
模态
Text
架构
Mixture of Experts (MoE)
许可证
Apache 2.0
发布日期
22 Jul 2025
训练数据截止日期
Dec 2024
专家参数总数
35.0B
专家数量
160
活跃专家
8
注意力结构
Multi-Head Attention
隐藏维度大小
6144
层数
62
注意力头
96
键值头
8
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
Qwen3 Coder 480B A35B is Alibaba's advanced agentic artificial intelligence model, specifically engineered for high-performance software development and autonomous coding workflows. As a specialized variant of the Qwen 3 family, it is designed to manage complex multi-turn programming tasks, including comprehensive repository analysis, cross-file reasoning, and automated pull request generation. The model serves as the primary engine for autonomous software engineering, enabling deep integration with developer tools and terminal-based agents like Qwen Code.
Architecturally, the model utilizes a sparse Mixture-of-Experts (MoE) decoder-only transformer framework. It comprises a total of 480 billion parameters, while maintaining computational efficiency by activating only 35 billion parameters per inference query. This configuration employs 160 total experts, with 8 active experts selected via a gating mechanism for each token. The underlying structure features 62 transformer layers and incorporates Grouped Query Attention (GQA) with 96 query heads and 8 key-value heads to optimize memory bandwidth and inference speed. It utilizes Rotary Position Embeddings (RoPE) and is optimized for long-horizon context through techniques such as YaRN, supporting a native context window of 262,144 tokens that can be extended up to one million.
The model is trained on a massive dataset of 7.5 trillion tokens, with a 70% concentration on source code and technical content across multiple programming languages including Python, JavaScript, C++, and Rust. Its post-training phase leverages long-horizon reinforcement learning, specifically Agent RL and Code RL, to improve multi-step planning and interaction with external tools such as browsers and CLI environments. This specialization allows the model to function as a sophisticated coding agent capable of executing complex engineering tasks and managing entire codebases with high precision.
The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.
排名
#26
| 基准 | 分数 | 排名 |
|---|---|---|
Web Development WebDev Arena | 1386 | 22 |