ApX 标志ApX 标志

趋近智

Qwen3-32B

参数

32B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

29 Apr 2025

训练数据截止日期

Aug 2024

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

5120

层数

60

注意力头

96

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

Qwen3-32B

Qwen3-32B is a dense large language model developed by Alibaba and is the premier dense variant within the Qwen3 series. Designed as a unified framework for both general-purpose interaction and complex problem-solving, the model introduces a hybrid reasoning mechanism. This architecture allows for a seamless transition between a 'thinking mode', characterized by generative chain-of-thought processing for mathematical and logical tasks, and a 'non-thinking mode' optimized for high-throughput, responsive dialogue. This dual-mode capability is implemented via a flexible switching system, enabling users to adapt the model's computational depth to the specific requirements of a given query.

Technically, the model is constructed on a 64-layer transformer architecture with 32.8 billion parameters. It utilizes Grouped Query Attention (GQA) with 64 query heads and 8 key-value heads to achieve an optimal balance between inference speed and representational capacity. The integration of QK-Norm and the removal of QKV-bias in this iteration contribute to enhanced training stability. For sequence modeling, the architecture employs Rotary Positional Embeddings (RoPE) with a base frequency of 1,000,000, supporting a native context length of 32,768 tokens that can be extended to 131,072 tokens using YaRN scaling. The model's internal activation uses the SwiGLU function, and normalization is handled through a pre-RMSNorm configuration.

Qwen3-32B is engineered for diverse operational environments, supporting over 100 languages and dialects. Its training pipeline follows a four-stage process including long chain-of-thought cold starts and reasoning-based reinforcement learning, which prepares the model for sophisticated agentic tasks and tool integration. The model is particularly effective in scenarios requiring multi-turn dialogue, complex instruction following, and autonomous tool use, providing a versatile foundation for developers building integrated AI systems across various global contexts.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


其他 Qwen 3 模型

评估基准

排名

#75

基准分数排名

0.40

7

0.48

26

0.68

27

Web Development

WebDev Arena

1347

29

0.67

31

0.66

36

Agentic Coding

LiveBench Agentic

0.03

41

排名

排名

#75

编程排名

#65

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU