DeepSeek-R1 32B: Specifications and GPU VRAM Requirements

DeepSeek-R1 32B

Open Source

Open Weights

Parameters

32B

Context Length

131.072K

Modality

Text

Architecture

Dense

License

MIT License

Release Date

27 Dec 2024

Knowledge Cutoff

Jul 2024

Technical Specifications

Attention Structure

Multi-Layer Attention

Hidden Dimension Size

8192

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Swish

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

DeepSeek-R1 32B

The DeepSeek-R1-Distill-Qwen-32B model represents a significant contribution to the field of large language models, specifically engineered for advanced reasoning tasks. This model is a distilled version that leverages the sophisticated reasoning capabilities of the larger DeepSeek-R1 model, transferring them into a more efficient 32-billion parameter architecture. It is built upon the Qwen2.5 series base model and fine-tuned using 800,000 curated reasoning samples generated by the original DeepSeek-R1, enabling it to perform complex problem-solving with a reduced parameter count suitable for broader deployment.

From an architectural standpoint, DeepSeek-R1-Distill-Qwen-32B is a dense transformer model. It incorporates the RoPE (Rotary Position Embedding) mechanism for handling sequence position information and utilizes FlashAttention-2 for optimized attention computation, enhancing efficiency and throughput. The model is designed with a context length of up to 131,072 tokens, allowing for processing and generation of extended sequences crucial for detailed analytical tasks. This architectural design prioritizes effective reasoning and generation while maintaining a manageable computational footprint.

The model's primary use cases include complex problem-solving, advanced mathematical reasoning, and robust coding performance across multiple programming languages. It is compatible with popular deployment frameworks such as vLLM and SGLang, facilitating its integration into various applications and research initiatives. The DeepSeek-R1-Distill-Qwen-32B model is released under the MIT License, which supports commercial use and permits modifications and derivative works, including further distillation. This licensing approach promotes open research and widespread adoption within the machine learning community.

About DeepSeek-R1

DeepSeek-R1 is a model family developed for logical reasoning tasks. It incorporates a Mixture-of-Experts architecture for computational efficiency and scalability. The family utilizes Multi-Head Latent Attention and employs reinforcement learning in its training, with some variants integrating cold-start data.

Other DeepSeek-R1 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#36

Benchmark	Score	Rank
Agentic Coding LiveBench Agentic	0.05	13
Reasoning LiveBench Reasoning	0.44	16
Mathematics LiveBench Mathematics	0.60	17
Coding LiveBench Coding	0.47	21
Data Analysis LiveBench Data Analysis	0.47	26

Rankings

Overall Rank

#36

Coding Rank

#30

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

64k

128k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code