ApX logoApX logo

Qwen3-32B

Parameters

32B

Context Length

131.072K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

29 Apr 2025

Knowledge Cutoff

Aug 2024

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

5120

Number of Layers

60

Attention Heads

96

Key-Value Heads

8

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

Qwen3-32B

Qwen3-32B is a dense large language model developed by Alibaba and is the premier dense variant within the Qwen3 series. Designed as a unified framework for both general-purpose interaction and complex problem-solving, the model introduces a hybrid reasoning mechanism. This architecture allows for a seamless transition between a 'thinking mode', characterized by generative chain-of-thought processing for mathematical and logical tasks, and a 'non-thinking mode' optimized for high-throughput, responsive dialogue. This dual-mode capability is implemented via a flexible switching system, enabling users to adapt the model's computational depth to the specific requirements of a given query.

Technically, the model is constructed on a 64-layer transformer architecture with 32.8 billion parameters. It utilizes Grouped Query Attention (GQA) with 64 query heads and 8 key-value heads to achieve an optimal balance between inference speed and representational capacity. The integration of QK-Norm and the removal of QKV-bias in this iteration contribute to enhanced training stability. For sequence modeling, the architecture employs Rotary Positional Embeddings (RoPE) with a base frequency of 1,000,000, supporting a native context length of 32,768 tokens that can be extended to 131,072 tokens using YaRN scaling. The model's internal activation uses the SwiGLU function, and normalization is handled through a pre-RMSNorm configuration.

Qwen3-32B is engineered for diverse operational environments, supporting over 100 languages and dialects. Its training pipeline follows a four-stage process including long chain-of-thought cold starts and reasoning-based reinforcement learning, which prepares the model for sophisticated agentic tasks and tool integration. The model is particularly effective in scenarios requiring multi-turn dialogue, complex instruction following, and autonomous tool use, providing a versatile foundation for developers building integrated AI systems across various global contexts.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


Other Qwen 3 Models

Evaluation Benchmarks

Rank

#75

BenchmarkScoreRank

0.40

7

0.48

26

0.68

27

Web Development

WebDev Arena

1347

29

0.67

31

0.66

36

Agentic Coding

LiveBench Agentic

0.03

41

Rankings

Overall Rank

#75

Coding Rank

#65

Model Transparency

Total Score

B

67 / 100

Qwen3-32B Transparency Report

Total Score

67

/ 100

B

Audit Note

Qwen3-32B exhibits strong transparency in its architectural documentation and licensing, providing a clear Apache 2.0 framework and detailed transformer specifications. However, it remains opaque regarding training compute resources and the specific composition of its 36-trillion-token dataset. While the model's identity and hardware requirements are well-defined, the lack of reproducible evaluation artifacts and a formal versioning changelog represent significant gaps for independent auditors.

Upstream

22.0 / 30

Architectural Provenance

8.0 / 10

The model architecture is extensively documented in the Qwen3 Technical Report (arXiv:2505.09388). It specifies a 64-layer transformer with 32.8 billion parameters, utilizing Grouped Query Attention (GQA) with 64 query heads and 8 KV heads. Key technical details such as QK-Norm, SwiGLU activation, and pre-RMSNorm are explicitly disclosed. The report also details the hybrid reasoning mechanism (thinking vs. non-thinking modes) and the four-stage training pipeline (cold start, reasoning RL, fusion, and general RL).

Dataset Composition

5.0 / 10

Alibaba discloses that the model was trained on 36 trillion tokens across 119 languages, nearly doubling the scale of Qwen2.5. While the technical report mentions broad categories (web, PDF-like documents, books, STEM, and code) and the use of synthetic data from Qwen2.5-Math/Coder, it lacks a precise percentage breakdown of the dataset composition. The methodology for data extraction using Qwen2.5-VL is documented, but specific data sources remain proprietary.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly available and well-documented. It uses byte-level Byte Pair Encoding (BBPE) with a vocabulary size of 151,669 tokens. Documentation provides specific efficiency metrics (e.g., 1 token ≈ 3-4 English characters vs. 1.5-1.8 Chinese characters) and confirms support for 119 languages. The tokenizer is integrated into the standard Hugging Face 'transformers' library, allowing for public verification of token counts and normalization.

Model

23.5 / 40

Parameter Density

8.5 / 10

As a dense model, Qwen3-32B clearly states its total (32.8B) and non-embedding (31.2B) parameter counts. This distinguishes it from the MoE variants in the same family (e.g., Qwen3-30B-A3B), for which active parameters are also clearly disclosed. The architectural breakdown (64 layers, specific head counts) is provided in official tables, ensuring no ambiguity regarding active vs. total parameters.

Training Compute

2.0 / 10

There is almost no specific information regarding the training compute resources. While the technical report mentions the use of 'massive compute' and the efficiency gains of the training process, it does not disclose GPU/TPU hours, hardware cluster specifications, total energy consumption, or the carbon footprint associated with the 36-trillion-token training run.

Benchmark Reproducibility

4.0 / 10

The model provides scores for standard benchmarks (AIME, LiveCodeBench, ArenaHard) and mentions the use of the EvalScope framework for evaluation. However, the exact prompts, few-shot examples, and specific seeds used for the official results are not fully disclosed in a reproducible 'evaluation recipe' format. Third-party testing on platforms like LiveBench helps, but the lack of a public, one-click reproduction repository for all claimed scores limits transparency.

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as part of the Qwen3 family and distinguishing between its 'thinking' and 'non-thinking' modes. It includes versioning in its metadata and does not attempt to mimic competitor identities. Documentation clearly outlines its capabilities and the specific 'thinking budget' mechanism that governs its reasoning behavior.

Downstream

21.5 / 30

License Clarity

10.0 / 10

The model is released under the Apache 2.0 license, which is a standard, highly permissive open-source license. This is explicitly stated in the technical report, the Hugging Face model card, and official blog posts. There are no conflicting proprietary terms for the 32B dense variant, and commercial use is clearly permitted without the restrictive 'Alibaba Cloud No-Charge License' found in some previous versions.

Hardware Footprint

7.5 / 10

VRAM requirements are well-documented by both the provider and the community. Official guidance specifies ~80GB for FP16, ~40GB for INT8, and ~20GB for INT4. The impact of context length (up to 128K with YaRN) on memory is noted, and quantization tools like AWQ and GGUF are supported with documented trade-offs. Detailed system requirements for consumer vs. datacenter GPUs are readily available.

Versioning Drift

4.0 / 10

While the model uses a naming convention that includes release dates (e.g., 2504), there is no centralized, detailed changelog for weight updates or behavioral drift. Users have reported silent configuration changes in inference frameworks (like RAGFlow) that affect performance. The transition to 'VL' variants as the primary update path for the 32B model is documented only through community discussions and fragmented release notes.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
64k
128k

VRAM Required:

Recommended GPUs