趋近智
参数
32B
上下文长度
131.072K
模态
Text
架构
Dense
许可证
Apache 2.0
发布日期
29 Apr 2025
训练数据截止日期
Aug 2024
注意力结构
Grouped-Query Attention
隐藏维度大小
5120
层数
60
注意力头
96
键值头
8
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
ROPE
Qwen3-32B is a dense large language model developed by Alibaba and is the premier dense variant within the Qwen3 series. Designed as a unified framework for both general-purpose interaction and complex problem-solving, the model introduces a hybrid reasoning mechanism. This architecture allows for a seamless transition between a 'thinking mode', characterized by generative chain-of-thought processing for mathematical and logical tasks, and a 'non-thinking mode' optimized for high-throughput, responsive dialogue. This dual-mode capability is implemented via a flexible switching system, enabling users to adapt the model's computational depth to the specific requirements of a given query.
Technically, the model is constructed on a 64-layer transformer architecture with 32.8 billion parameters. It utilizes Grouped Query Attention (GQA) with 64 query heads and 8 key-value heads to achieve an optimal balance between inference speed and representational capacity. The integration of QK-Norm and the removal of QKV-bias in this iteration contribute to enhanced training stability. For sequence modeling, the architecture employs Rotary Positional Embeddings (RoPE) with a base frequency of 1,000,000, supporting a native context length of 32,768 tokens that can be extended to 131,072 tokens using YaRN scaling. The model's internal activation uses the SwiGLU function, and normalization is handled through a pre-RMSNorm configuration.
Qwen3-32B is engineered for diverse operational environments, supporting over 100 languages and dialects. Its training pipeline follows a four-stage process including long chain-of-thought cold starts and reasoning-based reinforcement learning, which prepares the model for sophisticated agentic tasks and tool integration. The model is particularly effective in scenarios requiring multi-turn dialogue, complex instruction following, and autonomous tool use, providing a versatile foundation for developers building integrated AI systems across various global contexts.
The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.
排名
#75
| 基准 | 分数 | 排名 |
|---|---|---|
Coding Aider Coding | 0.40 | 7 |
Reasoning LiveBench Reasoning | 0.48 | 26 |
Data Analysis LiveBench Data Analysis | 0.68 | 27 |
Web Development WebDev Arena | 1347 | 29 |
Mathematics LiveBench Mathematics | 0.67 | 31 |
Coding LiveBench Coding | 0.66 | 36 |
Agentic Coding LiveBench Agentic | 0.03 | 41 |