ApX 标志

趋近智

Qwen3 Coder 480B A35B

活跃参数

480B

上下文长度

262.144K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

22 Jul 2025

知识截止

-

技术规格

专家参数总数

35.0B

专家数量

160

活跃专家

8

注意力结构

Multi-Head Attention

隐藏维度大小

-

层数

62

注意力头

96

键值头

8

激活函数

-

归一化

-

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Qwen3 Coder 480B A35B

Qwen3 Coder 480B A35B is Alibaba's advanced agentic AI coding model, developed for high-performance software development and autonomous coding workflows. This model is engineered to excel in tasks such as code generation, managing complex multi-turn programming workflows, and debugging entire codebases. It is designed to facilitate autonomous software engineering, enabling comprehensive repository analysis, cross-file reasoning, and automated pull request workflows. The model also supports integration with various developer tools and platforms, including its own open-sourced command-line interface, Qwen Code, which is adapted for customized prompts and function calling protocols.

Architecturally, Qwen3 Coder 480B A35B is a Mixture-of-Experts (MoE) model. It comprises a total of 480 billion parameters, with 35 billion active parameters utilized per inference query. This sparse activation strategy, involving 8 active experts out of 160 total experts, allows for efficient computation while maintaining high performance. The model features 62 layers and employs a Grouped Query Attention (GQA) mechanism with 96 query heads and 8 key-value heads. The core is a decoder-only transformer, optimized for MoE and enhanced with RoPE (Rotary Position Embedding) for handling extended context lengths.

The model's performance characteristics are geared towards real-world software engineering tasks. It demonstrates capabilities in advanced agentic coding, browser automation, and tool usage, supporting a wide range of programming and markup languages, including Python, JavaScript, Java, C++, Go, and Rust. Qwen3 Coder 480B A35B's training involved 7.5 trillion tokens with a 70% code ratio, balancing coding, general, and mathematical abilities. Post-training incorporates long-horizon reinforcement learning (Agent RL) to enable the model to solve complex problems through multi-step interactions with external tools.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


其他 Qwen 3 模型

评估基准

排名适用于本地LLM。

排名

#13

基准分数排名

0.73

🥈

2

Agentic Coding

LiveBench Agentic

0.25

🥈

2

0.65

9

0.67

12

0.55

13

Professional Knowledge

MMLU Pro

0.50

22

排名

排名

#13

编程排名

#9

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
128k
256k

所需显存:

推荐 GPU