ApX logoApX logo

DeepSeek-V4-Flash

Active Parameters

284B

Context Length

1,000K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

24 Apr 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

64

Key-Value Heads

1

Attention Head Dimension

512

Position Embedding

Absolute Position Embedding

RoPE Theta

10,000

Sliding Window Attention

Yes

Sliding Window Size

128

Normalization

RMS Normalization

Activation Function

Swish

Dimensions

Hidden Dimension Size

4,096

Number of Layers

43

FFN Intermediate Size (Dense)

2,048

Multi-Token Prediction Heads

1

Tokenizer

Vocabulary Size

129,280

Mixture of Experts

Total Expert Parameters

13.0B

Number of Experts

256

Active Experts

6

Shared Experts

1

FFN Intermediate Size (per Expert)

2,048

Dense Layers Before MoE

-

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 4.1k · Context: 1,000k · Vocab: 129.3kx 43 layersRMSNormPre-AttentionMulti-Head Attention64Q / 1KV heads · SW: 128Head dim: 512+RMSNormPre-FFNSparse MoE FFN (6/256 experts)SwishIntermediate: 2k+Final RMSNormOutput Logits

DeepSeek-V4-Flash

DeepSeek-V4-Flash is DeepSeek's fast, efficient, and economical MoE model in the V4 series, with 284B total parameters and 13B activated per token. Shares the same hybrid CSA+HCA attention architecture and 1M context support as V4-Pro. DeepSeek-V4-Flash-Max achieves comparable reasoning performance to V4-Pro when given a larger thinking budget. Strong on agentic and coding tasks (SWE-Bench Verified 79.0%, Terminal-Bench 2.0 56.9%), with smaller parameter scale enabling faster response times. Supports Non-think, Think High, and Think Max reasoning modes. Available via API as deepseek-v4-flash. Released open-source under MIT license on April 24, 2026.

About DeepSeek V4

DeepSeek-V4 is DeepSeek's latest generation of highly efficient Mixture-of-Experts language models, featuring a novel hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) that dramatically improves long-context efficiency. Pre-trained on 32T+ tokens with a comprehensive post-training pipeline including domain-specific expert cultivation and unified model consolidation. Both V4-Pro and V4-Flash support 1M context length as standard, with three reasoning effort modes (Non-think, Think High, Think Max). Released open-source under MIT license on April 24, 2026.


Other DeepSeek V4 Models

Evaluation Benchmarks

Rank

#77

No evaluation benchmarks for DeepSeek-V4-Flash available.

Rankings

Overall Rank

#77

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
488k
977k

VRAM Required:

Recommended GPUs