趋近智
活跃参数
480B
上下文长度
262.144K
模态
Text
架构
Mixture of Experts (MoE)
许可证
Apache 2.0
发布日期
22 Jul 2025
知识截止
-
专家参数总数
35.0B
专家数量
160
活跃专家
8
注意力结构
Multi-Head Attention
隐藏维度大小
-
层数
62
注意力头
96
键值头
8
激活函数
-
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
Qwen3 Coder 480B A35B is Alibaba's advanced agentic AI coding model, developed for high-performance software development and autonomous coding workflows. This model is engineered to excel in tasks such as code generation, managing complex multi-turn programming workflows, and debugging entire codebases. It is designed to facilitate autonomous software engineering, enabling comprehensive repository analysis, cross-file reasoning, and automated pull request workflows. The model also supports integration with various developer tools and platforms, including its own open-sourced command-line interface, Qwen Code, which is adapted for customized prompts and function calling protocols.
Architecturally, Qwen3 Coder 480B A35B is a Mixture-of-Experts (MoE) model. It comprises a total of 480 billion parameters, with 35 billion active parameters utilized per inference query. This sparse activation strategy, involving 8 active experts out of 160 total experts, allows for efficient computation while maintaining high performance. The model features 62 layers and employs a Grouped Query Attention (GQA) mechanism with 96 query heads and 8 key-value heads. The core is a decoder-only transformer, optimized for MoE and enhanced with RoPE (Rotary Position Embedding) for handling extended context lengths.
The model's performance characteristics are geared towards real-world software engineering tasks. It demonstrates capabilities in advanced agentic coding, browser automation, and tool usage, supporting a wide range of programming and markup languages, including Python, JavaScript, Java, C++, Go, and Rust. Qwen3 Coder 480B A35B's training involved 7.5 trillion tokens with a 70% code ratio, balancing coding, general, and mathematical abilities. Post-training incorporates long-horizon reinforcement learning (Agent RL) to enable the model to solve complex problems through multi-step interactions with external tools.
The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.
排名适用于本地LLM。
排名
#13
基准 | 分数 | 排名 |
---|---|---|
Coding LiveBench Coding | 0.73 | 🥈 2 |
Agentic Coding LiveBench Agentic | 0.25 | 🥈 2 |
Data Analysis LiveBench Data Analysis | 0.65 | 9 |
Mathematics LiveBench Mathematics | 0.67 | 12 |
Reasoning LiveBench Reasoning | 0.55 | 13 |
Professional Knowledge MMLU Pro | 0.50 | 22 |