DeepSeek-V4-Flash

Open Source

Open Weights

Active Parameters

284B

Context Length

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

24 Apr 2026

Knowledge Cutoff

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

597.99 GB VRAM

Consumer

32x RTX 4090

24GB VRAM

Datacenter

9x NVIDIA A100

80GB VRAM

Apple Silicon

7x Apple M3 Max

128GB VRAM

1,000,000 tokens

690.37 GB VRAM

Consumer

38x RTX 4090

24GB VRAM

Datacenter

10x NVIDIA A100

80GB VRAM

Apple Silicon

8x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

Rank

#44

Benchmark	Score	Rank
General Text Text Arena	1434	39

Rankings

Overall Rank

#44

Coding Rank

About DeepSeek-V4-Flash

DeepSeek-V4-Flash is DeepSeek's fast, efficient, and economical MoE model in the V4 series, with 284B total parameters and 13B activated per token. Shares the same hybrid CSA+HCA attention architecture and 1M context support as V4-Pro. DeepSeek-V4-Flash-Max achieves comparable reasoning performance to V4-Pro when given a larger thinking budget. Strong on agentic and coding tasks (SWE-Bench Verified 79.0%, Terminal-Bench 2.0 56.9%), with smaller parameter scale enabling faster response times. Supports Non-think, Think High, and Think Max reasoning modes. Available via API as deepseek-v4-flash. Released open-source under MIT license on April 24, 2026.

Technical Specifications

Attention

Attention Structure

DeepSeek Sparse Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

512

Position Embedding

Absolute Position Embedding

RoPE Theta

10,000

Sliding Window Attention

Yes

Sliding Window Size

128

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

Swish

Dimensions

Hidden Dimension Size

4,096

Number of Layers

FFN Intermediate Size (Dense)

2,048

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

129,280

Mixture of Experts

Total Expert Parameters

13.0B

Number of Experts

256

Active Experts

Shared Experts

FFN Intermediate Size (per Expert)

2,048

Dense Layers Before MoE

Resources

Official Documentation Download Weights

About DeepSeek V4

DeepSeek-V4 is DeepSeek's latest generation of highly efficient Mixture-of-Experts language models, featuring a novel hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) that dramatically improves long-context efficiency. Pre-trained on 32T+ tokens with a comprehensive post-training pipeline including domain-specific expert cultivation and unified model consolidation. Both V4-Pro and V4-Flash support 1M context length as standard, with three reasoning effort modes (Non-think, Think High, Think Max). Released open-source under MIT license on April 24, 2026.

Other DeepSeek V4 Models

DeepSeek-V4-Pro