ApX logoApX logo

DeepSeek-V3.2

Active Parameters

671B

Context Length

128K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

10 Jan 2026

Knowledge Cutoff

-

Technical Specifications

Total Expert Parameters

37.0B

Number of Experts

-

Active Experts

-

Attention Structure

Multi-Head Attention

Hidden Dimension Size

-

Number of Layers

-

Attention Heads

-

Key-Value Heads

-

Activation Function

-

Normalization

-

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

DeepSeek-V3.2

DeepSeek-V3.2 is a powerful open-source Mixture-of-Experts (MoE) language model with 671B total parameters and 37B activated parameters per token. Built with an innovative architecture combining Multi-head Latent Attention (MLA) and DeepSeekMoE for efficient inference. Achieves exceptional performance across multiple benchmarks: 90.2% on MMLU-Pro, 84.5% on GPQA Diamond, 91.6% on MATH-500, 78.1% on Codeforces, and 92.3% on HumanEval. Supports 128k context window with strong multilingual capabilities. Features superior coding abilities, advanced mathematical reasoning, and competitive performance with leading closed-source models. Trained on 14.8 trillion diverse, high-quality tokens. MIT licensed for both research and commercial use. Ideal for complex reasoning, code generation, mathematical problem-solving, and general-purpose language understanding tasks.

About DeepSeek-V3

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model comprising 671B parameters with 37B activated per token. Its architecture incorporates Multi-head Latent Attention and DeepSeekMoE for efficient inference and training. Innovations include an auxiliary-loss-free load balancing strategy and a multi-token prediction objective, trained on 14.8T tokens.


Other DeepSeek-V3 Models

Evaluation Benchmarks

Rank

#38

BenchmarkScoreRank

0.74

7

Agentic Coding

LiveBench Agentic

0.47

14

0.76

15

0.67

34

0.46

37

0.64

40

Graduate-Level QA

GPQA

0.8

50

Rankings

Overall Rank

#38

Coding Rank

#12

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs