Active Parameters
1.6T
Context Length
1,000K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
MIT
Release Date
24 Apr 2026
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Head Attention
Attention Heads
128
Key-Value Heads
1
Attention Head Dimension
512
Position Embedding
Absolute Position Embedding
RoPE Theta
10,000
Sliding Window Attention
Yes
Sliding Window Size
128
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
7,168
Number of Layers
61
FFN Intermediate Size (Dense)
3,072
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
129,280
Mixture of Experts
Total Expert Parameters
49.0B
Number of Experts
384
Active Experts
6
Shared Experts
1
FFN Intermediate Size (per Expert)
3,072
Dense Layers Before MoE
-
DeepSeek-V4-Pro is DeepSeek's flagship open-source model with 1.6T total parameters and 49B activated per token. Features a novel hybrid CSA+HCA attention mechanism that achieves 1M context with only 27% of the inference FLOPs and 10% of the KV cache compared to DeepSeek-V3.2. In Think Max mode (DeepSeek-V4-Pro-Max), it achieves state-of-the-art open-source results: SWE-Bench Verified 80.6%, SWE-Bench Pro 55.4%, Terminal-Bench 2.0 67.9%, MRCR 1M 83.5%, GPQA Diamond 90.1%, LiveCodeBench 93.5%, and Codeforces Rating 3206. Supports Non-think, Think High, and Think Max reasoning modes. Available via API as deepseek-v4-pro. Released open-source under MIT license on April 24, 2026.
DeepSeek-V4 is DeepSeek's latest generation of highly efficient Mixture-of-Experts language models, featuring a novel hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) that dramatically improves long-context efficiency. Pre-trained on 32T+ tokens with a comprehensive post-training pipeline including domain-specific expert cultivation and unified model consolidation. Both V4-Pro and V4-Flash support 1M context length as standard, with three reasoning effort modes (Non-think, Think High, Think Max). Released open-source under MIT license on April 24, 2026.
Rank
#76
No evaluation benchmarks for DeepSeek-V4-Pro available.
Overall Rank
#76
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online