ApX logoApX logo

DeepSeek-V3.2 Thinking

Active Parameters

671B

Context Length

128K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

10 Jan 2026

Knowledge Cutoff

-

Technical Specifications

Total Expert Parameters

37.0B

Number of Experts

-

Active Experts

-

Attention Structure

Multi-Head Attention

Hidden Dimension Size

-

Number of Layers

-

Attention Heads

-

Key-Value Heads

-

Activation Function

-

Normalization

-

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

DeepSeek-V3.2 Thinking

DeepSeek-V3.2 Thinking is the reasoning-enhanced variant of DeepSeek-V3.2, specifically optimized for complex problem-solving through chain-of-thought reasoning. Based on the same 671B parameter MoE architecture with 37B activated parameters, this model is fine-tuned to produce detailed reasoning traces before generating final answers. Excels at multi-step logical reasoning, mathematical proofs, algorithmic problem-solving, and tasks requiring explicit step-by-step thinking. Achieves enhanced performance on reasoning benchmarks: 94.8% on MATH-500 (with reasoning), 85.2% on Codeforces, and 73.4% on AIME. The thinking mode provides transparency into the model's reasoning process, making it ideal for educational applications, research, debugging complex logic, and scenarios where interpretability is crucial. Supports 128k context window with strong multilingual reasoning capabilities. MIT licensed.

About DeepSeek-V3

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model comprising 671B parameters with 37B activated per token. Its architecture incorporates Multi-head Latent Attention and DeepSeekMoE for efficient inference and training. Innovations include an auxiliary-loss-free load balancing strategy and a multi-token prediction objective, trained on 14.8T tokens.


Other DeepSeek-V3 Models

Evaluation Benchmarks

Rank

#21

BenchmarkScoreRank

0.73

6

0.85

8

0.77

14

Agentic Coding

LiveBench Agentic

0.40

18

0.70

31

Graduate-Level QA

GPQA

0.82

42

Rankings

Overall Rank

#21

Coding Rank

#35

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs

DeepSeek-V3.2 Thinking: Specifications and GPU VRAM Requirements