ApX logoApX logo

Qwen3 Coder 480B A35B

Active Parameters

480B

Context Length

262.144K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

22 Jul 2025

Knowledge Cutoff

Dec 2024

Technical Specifications

Total Expert Parameters

35.0B

Number of Experts

160

Active Experts

8

Attention Structure

Multi-Head Attention

Hidden Dimension Size

6144

Number of Layers

62

Attention Heads

96

Key-Value Heads

8

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

Qwen3 Coder 480B A35B

Qwen3 Coder 480B A35B is Alibaba's advanced agentic artificial intelligence model, specifically engineered for high-performance software development and autonomous coding workflows. As a specialized variant of the Qwen 3 family, it is designed to manage complex multi-turn programming tasks, including comprehensive repository analysis, cross-file reasoning, and automated pull request generation. The model serves as the primary engine for autonomous software engineering, enabling deep integration with developer tools and terminal-based agents like Qwen Code.

Architecturally, the model utilizes a sparse Mixture-of-Experts (MoE) decoder-only transformer framework. It comprises a total of 480 billion parameters, while maintaining computational efficiency by activating only 35 billion parameters per inference query. This configuration employs 160 total experts, with 8 active experts selected via a gating mechanism for each token. The underlying structure features 62 transformer layers and incorporates Grouped Query Attention (GQA) with 96 query heads and 8 key-value heads to optimize memory bandwidth and inference speed. It utilizes Rotary Position Embeddings (RoPE) and is optimized for long-horizon context through techniques such as YaRN, supporting a native context window of 262,144 tokens that can be extended up to one million.

The model is trained on a massive dataset of 7.5 trillion tokens, with a 70% concentration on source code and technical content across multiple programming languages including Python, JavaScript, C++, and Rust. Its post-training phase leverages long-horizon reinforcement learning, specifically Agent RL and Code RL, to improve multi-step planning and interaction with external tools such as browsers and CLI environments. This specialization allows the model to function as a sophisticated coding agent capable of executing complex engineering tasks and managing entire codebases with high precision.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


Other Qwen 3 Models

Evaluation Benchmarks

Rank

#26

BenchmarkScoreRank

Web Development

WebDev Arena

1386

22

Rankings

Overall Rank

#26

Coding Rank

#31

Model Transparency

Total Score

B

68 / 100

Qwen3 Coder 480B A35B Transparency Report

Total Score

68

/ 100

B

Audit Note

Qwen3-Coder 480B A35B demonstrates high transparency in its architectural design and licensing, providing clear distinctions between total and active parameters in its Mixture-of-Experts framework. However, the profile is weakened by a lack of disclosure regarding training compute resources and environmental impact. While the model is accessible and well-documented for deployment, reproducibility concerns regarding certain benchmark claims suggest a need for more granular evaluation transparency.

Upstream

23.0 / 30

Architectural Provenance

8.0 / 10

The model's architecture is extensively documented in the Qwen3 technical report (arXiv:2505.09388) and official blog posts. It is a sparse Mixture-of-Experts (MoE) decoder-only transformer with 480B total and 35B active parameters. Specific details include 62 layers, Grouped Query Attention (GQA) with 96 query and 8 KV heads, and the use of Rotary Position Embeddings (RoPE) with YaRN for context extension. The training methodology, including the use of Agent RL and Code RL for post-training, is clearly described.

Dataset Composition

6.0 / 10

Alibaba discloses the total token count (7.5 trillion) and a high-level composition breakdown (70% code, 30% general text/math). While it names specific programming languages (Python, JavaScript, C++, Rust) and mentions the use of synthetic data cleaned by Qwen2.5-Coder, it lacks a granular breakdown of specific web or book sources. The filtering and cleaning methodologies are mentioned but not provided in exhaustive detail.

Tokenizer Integrity

9.0 / 10

The model uses the Qwen3 tokenizer with a vocabulary size of 151,936. The tokenizer is publicly available via the official GitHub repository and Hugging Face. Documentation specifies the use of ChatML templates and a new tool parser (qwen3coder_tool_parser.py) to maintain consistency with the Qwen3 family's agentic capabilities. Tokenization alignment with claimed language support is verifiable through the provided code snippets.

Model

23.5 / 40

Parameter Density

8.5 / 10

The model provides exemplary clarity regarding its MoE structure, explicitly stating 480B total parameters and 35B active parameters per token. It further details the expert configuration (160 total experts, 8 active per forward pass). Architectural dimensions like hidden size (6144) and intermediate sizes are publicly documented in the model card and technical reports.

Training Compute

2.0 / 10

Information regarding training compute is largely absent. While the technical report mentions the scale of the pre-training (7.5T tokens), it does not disclose specific GPU/TPU hours, hardware quantities used for the full training run, or the total carbon footprint. Some third-party sources speculate on the massive resource requirements, but official verifiable metrics are missing.

Benchmark Reproducibility

4.0 / 10

While the model provides scores for standard benchmarks like SWE-Bench (69.6%) and HumanEval, independent researchers have reported difficulties reproducing certain flagship claims (notably ARC-AGI-1). Evaluation code for the 'Qwen Code' CLI is public, but the exact prompts and few-shot examples used for all reported benchmarks are not fully transparent, leading to skepticism in the research community.

Identity Consistency

9.0 / 10

The model exhibits strong identity consistency, correctly identifying itself as a Qwen3-Coder variant in official documentation and API responses. It is transparent about its 'non-thinking' mode (unlike the Qwen3-Max-Thinking variant) and its specific optimization for agentic coding tasks. Versioning is clear across the Qwen3 suite.

Downstream

21.5 / 30

License Clarity

9.5 / 10

The model and its weights are released under the Apache 2.0 license, which is a highly permissive, standard open-source license. The terms for commercial use, modification, and distribution are explicitly clear and consistent across GitHub, Hugging Face, and official blog posts. There are no conflicting proprietary 'look-but-don't-touch' clauses.

Hardware Footprint

7.0 / 10

VRAM requirements are well-documented for the full model and common quantizations (e.g., 2-bit GGUF for 1M context). The model card provides guidance on using the latest 'transformers' library to avoid architectural errors. However, detailed memory scaling charts for varying batch sizes and the specific accuracy tradeoffs for extreme quantizations (like 2-bit) are primarily community-driven rather than officially documented.

Versioning Drift

5.0 / 10

The model uses clear naming conventions (e.g., Qwen3-Coder-480B-A35B-Instruct), but a formal, detailed changelog for weight updates or silent 'alignment' patches is not readily accessible. While major releases are well-publicized, the tracking of minor iterative drifts in model behavior over time lacks a centralized, transparent repository.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
128k
256k

VRAM Required:

Recommended GPUs

Qwen3 Coder 480B A35B: Specifications and GPU VRAM Requirements