ApX logo

DeepSeek-V3.1

Active Parameters

671B

Context Length

128K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT License

Release Date

21 Aug 2025

Knowledge Cutoff

-

Technical Specifications

Total Expert Parameters

37.0B

Number of Experts

257

Active Experts

8

Attention Structure

Multi-Head Attention

Hidden Dimension Size

7168

Number of Layers

61

Attention Heads

-

Key-Value Heads

-

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

DeepSeek-V3.1

A hybrid model that supports both "thinking" and "non-thinking" modes for chat, reasoning, and coding. It's a Mixture-of-Experts (MoE) model with a massive context length and efficient architecture.

About DeepSeek-V3

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model comprising 671B parameters with 37B activated per token. Its architecture incorporates Multi-head Latent Attention and DeepSeekMoE for efficient inference and training. Innovations include an auxiliary-loss-free load balancing strategy and a multi-token prediction objective, trained on 14.8T tokens.


Other DeepSeek-V3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#5

BenchmarkScoreRank

0.73

🥈

2

Graduate-Level QA

GPQA

0.80

🥈

2

0.72

5

0.62

6

0.48

6

Web Development

WebDev Arena

1359.84

6

0.82

7

Professional Knowledge

MMLU Pro

0.85

8

General Knowledge

MMLU

0.68

11

Rankings

Overall Rank

#5

Coding Rank

#13

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs

DeepSeek-V3.1: Specifications and GPU VRAM Requirements