DeepSeek-V4-Pro

Open Source

Open Weights

Active Parameters

1.6T

Context Length

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

24 Apr 2026

Knowledge Cutoff

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

3361.63 GB VRAM

Consumer

251x RTX 4090

24GB VRAM

Datacenter

58x NVIDIA A100

80GB VRAM

Apple Silicon

56x Apple M3 Max

128GB VRAM

1,000,000 tokens

3492.67 GB VRAM

Consumer

264x RTX 4090

24GB VRAM

Datacenter

61x NVIDIA A100

80GB VRAM

Apple Silicon

59x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

Rank

#22

Benchmark	Score	Rank
Web Development WebDev Arena	1462	18
General Text Text Arena	1457	19

Rankings

Overall Rank

#22

Coding Rank

#39

About DeepSeek-V4-Pro

DeepSeek-V4-Pro is DeepSeek's flagship open-source model with 1.6T total parameters and 49B activated per token. Features a novel hybrid CSA+HCA attention mechanism that achieves 1M context with only 27% of the inference FLOPs and 10% of the KV cache compared to DeepSeek-V3.2. In Think Max mode (DeepSeek-V4-Pro-Max), it achieves state-of-the-art open-source results: SWE-Bench Verified 80.6%, SWE-Bench Pro 55.4%, Terminal-Bench 2.0 67.9%, MRCR 1M 83.5%, GPQA Diamond 90.1%, LiveCodeBench 93.5%, and Codeforces Rating 3206. Supports Non-think, Think High, and Think Max reasoning modes. Available via API as deepseek-v4-pro. Released open-source under MIT license on April 24, 2026.

Technical Specifications

Attention

Attention Structure

DeepSeek Sparse Attention

Attention Heads

128

Key-Value Heads

Attention Head Dimension

512

Position Embedding

Absolute Position Embedding

RoPE Theta

10,000

Sliding Window Attention

Yes

Sliding Window Size

128

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

7,168

Number of Layers

FFN Intermediate Size (Dense)

3,072

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

129,280

Mixture of Experts

Total Expert Parameters

49.0B

Number of Experts

384

Active Experts

Shared Experts

FFN Intermediate Size (per Expert)

3,072

Dense Layers Before MoE

Resources

Official Documentation Download Weights

About DeepSeek V4

DeepSeek-V4 is DeepSeek's latest generation of highly efficient Mixture-of-Experts language models, featuring a novel hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) that dramatically improves long-context efficiency. Pre-trained on 32T+ tokens with a comprehensive post-training pipeline including domain-specific expert cultivation and unified model consolidation. Both V4-Pro and V4-Flash support 1M context length as standard, with three reasoning effort modes (Non-think, Think High, Think Max). Released open-source under MIT license on April 24, 2026.

Other DeepSeek V4 Models

DeepSeek-V4-Flash