Qwen3-32B: Specifications and GPU VRAM Requirements

Qwen3-32B

闭源

开放权重

参数

32B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

29 Apr 2025

知识截止

Aug 2024

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen3-32B

Qwen3-32B is a dense large language model developed by Alibaba, part of the comprehensive Qwen3 series. This model is specifically engineered to address a broad range of natural language processing tasks, operating as a causal language model optimized for text generation. A core innovation in its design is the dual-mode reasoning approach, which enables dynamic switching between a "thinking mode" and a "non-thinking mode." The thinking mode is engaged for complex computational tasks, including logical reasoning, mathematical problem-solving, and code generation, allowing for a structured, step-by-step approach to problem-solving. Conversely, the non-thinking mode is utilized for efficient, general-purpose dialogue, prioritizing responsiveness in routine interactions. This adaptable architecture allows Qwen3-32B to optimize its computational strategy based on the input's complexity, balancing accuracy for demanding tasks with efficiency for everyday use.

Architecturally, Qwen3-32B is built upon a dense transformer framework, incorporating 32.8 billion parameters and 64 layers. It employs Grouped Query Attention (GQA) with 64 query heads and 8 key-value heads, a configuration designed to enhance inference efficiency while sustaining high performance. The model incorporates Rotary Positional Embeddings (RoPE) to effectively manage sequence positions and utilizes SwiGLU as its activation function. Normalization within the model is performed using RMSNorm, specifically with a pre-normalization scheme, which contributes to stable training and performance.

Qwen3-32B supports a broad spectrum of languages, encompassing over 100 languages and dialects, extending its applicability across diverse global communication contexts. The model is suitable for applications that require robust reasoning capabilities, adherence to instructions, and agentic functions, demonstrating proficiency in tool integration for agent-based tasks. Qwen3-32B natively supports a context length of 32,768 tokens, which can be extended to 131,072 tokens through the application of YaRN (Yet another RoPE N) scaling techniques, facilitating the processing of long-form content.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

其他 Qwen 3 模型

评估基准

排名适用于本地LLM。

排名

基准	分数	排名
Reasoning LiveBench Reasoning	0.83	🥉 3
Data Analysis LiveBench Data Analysis	0.68	🥉 3
Mathematics LiveBench Mathematics	0.80	⭐ 4
StackUnseen ProLLM Stack Unseen	0.46	5
Coding LiveBench Coding	0.64	7
Graduate-Level QA GPQA	0.65	8
Agentic Coding LiveBench Agentic	0.10	10

排名

编程排名

#17

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档发布说明阅读论文下载权重

Qwen3-32B

技术规格

系统要求

Qwen3-32B

关于 Qwen 3

其他 Qwen 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源