ApX logo

Qwen3 Coder 480B A35B

Active Parameters

480B

Context Length

262.144K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

22 Jul 2025

Knowledge Cutoff

-

Technical Specifications

Total Expert Parameters

35.0B

Number of Experts

160

Active Experts

8

Attention Structure

Multi-Head Attention

Hidden Dimension Size

-

Number of Layers

62

Attention Heads

96

Key-Value Heads

8

Activation Function

-

Normalization

-

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Qwen3 Coder 480B A35B

Qwen3 Coder 480B A35B is Alibaba's advanced agentic AI coding model, developed for high-performance software development and autonomous coding workflows. This model is engineered to excel in tasks such as code generation, managing complex multi-turn programming workflows, and debugging entire codebases. It is designed to facilitate autonomous software engineering, enabling comprehensive repository analysis, cross-file reasoning, and automated pull request workflows. The model also supports integration with various developer tools and platforms, including its own open-sourced command-line interface, Qwen Code, which is adapted for customized prompts and function calling protocols.

Architecturally, Qwen3 Coder 480B A35B is a Mixture-of-Experts (MoE) model. It comprises a total of 480 billion parameters, with 35 billion active parameters utilized per inference query. This sparse activation strategy, involving 8 active experts out of 160 total experts, allows for efficient computation while maintaining high performance. The model features 62 layers and employs a Grouped Query Attention (GQA) mechanism with 96 query heads and 8 key-value heads. The core is a decoder-only transformer, optimized for MoE and enhanced with RoPE (Rotary Position Embedding) for handling extended context lengths.

The model's performance characteristics are geared towards real-world software engineering tasks. It demonstrates capabilities in advanced agentic coding, browser automation, and tool usage, supporting a wide range of programming and markup languages, including Python, JavaScript, Java, C++, Go, and Rust. Qwen3 Coder 480B A35B's training involved 7.5 trillion tokens with a 70% code ratio, balancing coding, general, and mathematical abilities. Post-training incorporates long-horizon reinforcement learning (Agent RL) to enable the model to solve complex problems through multi-step interactions with external tools.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


Other Qwen 3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#13

BenchmarkScoreRank

0.73

🥈

2

Agentic Coding

LiveBench Agentic

0.25

🥈

2

0.65

9

0.67

12

0.55

13

Professional Knowledge

MMLU Pro

0.50

22

Rankings

Overall Rank

#13

Coding Rank

#9

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
128k
256k

VRAM Required:

Recommended GPUs

Qwen3 Coder 480B A35B: Specifications and GPU VRAM Requirements