ApX logoApX logo

Qwen3-32B

Parameters

32B

Context Length

131.072K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

29 Apr 2025

Knowledge Cutoff

Aug 2024

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

5120

Number of Layers

60

Attention Heads

96

Key-Value Heads

8

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

Qwen3-32B

Qwen3-32B is a dense large language model developed by Alibaba and is the premier dense variant within the Qwen3 series. Designed as a unified framework for both general-purpose interaction and complex problem-solving, the model introduces a hybrid reasoning mechanism. This architecture allows for a seamless transition between a 'thinking mode', characterized by generative chain-of-thought processing for mathematical and logical tasks, and a 'non-thinking mode' optimized for high-throughput, responsive dialogue. This dual-mode capability is implemented via a flexible switching system, enabling users to adapt the model's computational depth to the specific requirements of a given query.

Technically, the model is constructed on a 64-layer transformer architecture with 32.8 billion parameters. It utilizes Grouped Query Attention (GQA) with 64 query heads and 8 key-value heads to achieve an optimal balance between inference speed and representational capacity. The integration of QK-Norm and the removal of QKV-bias in this iteration contribute to enhanced training stability. For sequence modeling, the architecture employs Rotary Positional Embeddings (RoPE) with a base frequency of 1,000,000, supporting a native context length of 32,768 tokens that can be extended to 131,072 tokens using YaRN scaling. The model's internal activation uses the SwiGLU function, and normalization is handled through a pre-RMSNorm configuration.

Qwen3-32B is engineered for diverse operational environments, supporting over 100 languages and dialects. Its training pipeline follows a four-stage process including long chain-of-thought cold starts and reasoning-based reinforcement learning, which prepares the model for sophisticated agentic tasks and tool integration. The model is particularly effective in scenarios requiring multi-turn dialogue, complex instruction following, and autonomous tool use, providing a versatile foundation for developers building integrated AI systems across various global contexts.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


Other Qwen 3 Models

Evaluation Benchmarks

Rank

#75

BenchmarkScoreRank

0.40

7

0.48

26

0.68

27

Web Development

WebDev Arena

1347

29

0.67

31

0.66

36

Agentic Coding

LiveBench Agentic

0.03

41

Rankings

Overall Rank

#75

Coding Rank

#65

Model Transparency

Total Score

B

67 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
64k
128k

VRAM Required:

Recommended GPUs

Qwen3-32B: Specifications and GPU VRAM Requirements