Active Parameters
284B
Context Length
1,000K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
MIT
Release Date
24 Apr 2026
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Head Attention
Attention Heads
64
Key-Value Heads
1
Attention Head Dimension
512
Position Embedding
Absolute Position Embedding
RoPE Theta
10,000
Sliding Window Attention
Yes
Sliding Window Size
128
Normalization
RMS Normalization
Activation Function
Swish
Dimensions
Hidden Dimension Size
4,096
Number of Layers
43
FFN Intermediate Size (Dense)
2,048
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
129,280
Mixture of Experts
Total Expert Parameters
13.0B
Number of Experts
256
Active Experts
6
Shared Experts
1
FFN Intermediate Size (per Expert)
2,048
Dense Layers Before MoE
-
DeepSeek-V4-Flash is DeepSeek's fast, efficient, and economical MoE model in the V4 series, with 284B total parameters and 13B activated per token. Shares the same hybrid CSA+HCA attention architecture and 1M context support as V4-Pro. DeepSeek-V4-Flash-Max achieves comparable reasoning performance to V4-Pro when given a larger thinking budget. Strong on agentic and coding tasks (SWE-Bench Verified 79.0%, Terminal-Bench 2.0 56.9%), with smaller parameter scale enabling faster response times. Supports Non-think, Think High, and Think Max reasoning modes. Available via API as deepseek-v4-flash. Released open-source under MIT license on April 24, 2026.
DeepSeek-V4 is DeepSeek's latest generation of highly efficient Mixture-of-Experts language models, featuring a novel hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) that dramatically improves long-context efficiency. Pre-trained on 32T+ tokens with a comprehensive post-training pipeline including domain-specific expert cultivation and unified model consolidation. Both V4-Pro and V4-Flash support 1M context length as standard, with three reasoning effort modes (Non-think, Think High, Think Max). Released open-source under MIT license on April 24, 2026.
Rank
#77
No evaluation benchmarks for DeepSeek-V4-Flash available.
Overall Rank
#77
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online