Qwen3 Coder 480B A35B: Specifications and GPU VRAM Requirements

Qwen3 Coder 480B A35B

闭源

开放权重

活跃参数

480B

上下文长度

262.144K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

22 Jul 2025

知识截止

技术规格

专家参数总数

35.0B

专家数量

160

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

层数

注意力头

键值头

激活函数

归一化

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Qwen3 Coder 480B A35B

Qwen3 Coder 480B A35B is Alibaba's advanced agentic AI coding model, developed for high-performance software development and autonomous coding workflows. This model is engineered to excel in tasks such as code generation, managing complex multi-turn programming workflows, and debugging entire codebases. It is designed to facilitate autonomous software engineering, enabling comprehensive repository analysis, cross-file reasoning, and automated pull request workflows. The model also supports integration with various developer tools and platforms, including its own open-sourced command-line interface, Qwen Code, which is adapted for customized prompts and function calling protocols.

Architecturally, Qwen3 Coder 480B A35B is a Mixture-of-Experts (MoE) model. It comprises a total of 480 billion parameters, with 35 billion active parameters utilized per inference query. This sparse activation strategy, involving 8 active experts out of 160 total experts, allows for efficient computation while maintaining high performance. The model features 62 layers and employs a Grouped Query Attention (GQA) mechanism with 96 query heads and 8 key-value heads. The core is a decoder-only transformer, optimized for MoE and enhanced with RoPE (Rotary Position Embedding) for handling extended context lengths.

The model's performance characteristics are geared towards real-world software engineering tasks. It demonstrates capabilities in advanced agentic coding, browser automation, and tool usage, supporting a wide range of programming and markup languages, including Python, JavaScript, Java, C++, Go, and Rust. Qwen3 Coder 480B A35B's training involved 7.5 trillion tokens with a 70% code ratio, balancing coding, general, and mathematical abilities. Post-training incorporates long-horizon reinforcement learning (Agent RL) to enable the model to solve complex problems through multi-step interactions with external tools.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

其他 Qwen 3 模型

评估基准

排名适用于本地LLM。

排名

#13

基准	分数	排名
Coding LiveBench Coding	0.73	🥈 2
Agentic Coding LiveBench Agentic	0.25	🥈 2
Data Analysis LiveBench Data Analysis	0.65	9
Mathematics LiveBench Mathematics	0.67	12
Reasoning LiveBench Reasoning	0.55	13
Professional Knowledge MMLU Pro	0.50	22

排名

#13

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

128k

256k

所需显存:

资源

官方文档阅读论文下载权重

Qwen3 Coder 480B A35B

技术规格

系统要求

Qwen3 Coder 480B A35B

关于 Qwen 3

其他 Qwen 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源