ApX logoApX logo

DeepSeek-V3.1

Active Parameters

671B

Context Length

128K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT License

Release Date

21 Aug 2025

Knowledge Cutoff

-

Technical Specifications

Total Expert Parameters

37.0B

Number of Experts

257

Active Experts

8

Attention Structure

Multi-Head Attention

Hidden Dimension Size

7168

Number of Layers

61

Attention Heads

-

Key-Value Heads

-

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

DeepSeek-V3.1

A hybrid model that supports both "thinking" and "non-thinking" modes for chat, reasoning, and coding. It's a Mixture-of-Experts (MoE) model with a massive context length and efficient architecture.

About DeepSeek-V3

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model comprising 671B parameters with 37B activated per token. Its architecture incorporates Multi-head Latent Attention and DeepSeekMoE for efficient inference and training. Innovations include an auxiliary-loss-free load balancing strategy and a multi-token prediction objective, trained on 14.8T tokens.


Other DeepSeek-V3 Models

Evaluation Benchmarks

Rank

#12

BenchmarkScoreRank

Graduate-Level QA

GPQA

0.80

🥇

1

0.73

6

0.72

8

General Knowledge

MMLU

0.68

10

Professional Knowledge

MMLU Pro

0.84

13

0.82

15

0.48

18

Web Development

WebDev Arena

1359.84

24

0.62

25

Agentic Coding

LiveBench Agentic

0.32

29

Rankings

Overall Rank

#12

Coding Rank

#32

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs