Qwen3 235B A22B Thinking: Specifications and GPU VRAM Requirements

Qwen3 235B A22B Thinking

Closed Source

Open Weights

Active Parameters

235B

Context Length

262.144K

Modality

Reasoning

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

25 Jul 2025

Knowledge Cutoff

Technical Specifications

Total Expert Parameters

22.0B

Number of Experts

128

Active Experts

Attention Structure

Multi-Head Attention

Hidden Dimension Size

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Qwen3 235B A22B Thinking

The Qwen3-235B-A22B-Thinking model is a specialized variant within Alibaba's Qwen3 series of large language models, engineered for complex cognitive tasks requiring advanced reasoning. This model operates as a causal language model and is specifically designed to perform logical deduction, strategic planning, and systematic problem-solving. Its name, incorporating "Thinking," directly reflects its fine-tuning on datasets that emphasize and reward step-by-step analytical processes. This model is distinct from its general-purpose counterparts in the Qwen3 family, which often combine both thinking and non-thinking modes, as it focuses solely on the reasoning mode.

Architecturally, Qwen3-235B-A22B-Thinking leverages a Mixture-of-Experts (MoE) design, which is a cornerstone of the Qwen3 series. This architecture allows the model to achieve high performance while managing computational efficiency. Specifically, the model has a total of 235 billion parameters, but for any given inference pass, it activates approximately 22 billion parameters from a pool of 128 distinct experts, with 8 experts activated per token. This selective activation significantly reduces the computational load and latency compared to traditional dense models where all parameters are engaged. The model incorporates Grouped-Query Attention (GQA) with 64 query heads and 4 key/value heads, optimizing inference speed and memory utilization. It has 94 layers and uses an absolute position embedding.

Regarding performance characteristics and use cases, Qwen3-235B-A22B-Thinking is optimized for scenarios demanding deep analysis, such as logical reasoning, mathematics, science, and coding challenges. The model supports a native context length of 262,144 tokens, a substantial increase from previous iterations, making it highly effective for processing extensive documents and engaging in long-context applications. Its design allows for dynamic control over the reasoning depth, with recommendations for a maximum output length of 81,920 tokens for complex problems to facilitate detailed responses. The model's capabilities extend to multilingual instruction following and tool usage, positioning it for advanced agentic workflows that require sophisticated problem-solving.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

Other Qwen 3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

Benchmark	Score	Rank
Data Analysis LiveBench Data Analysis	0.68	🥈 2
Mathematics LiveBench Mathematics	0.80	🥉 3
Coding LiveBench Coding	0.66	5
Reasoning LiveBench Reasoning	0.78	5
Agentic Coding LiveBench Agentic	0.13	7

Rankings

Overall Rank

Coding Rank

#15

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

128k

256k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights