DeepSeek-R1 70B: Specifications and GPU VRAM Requirements

DeepSeek-R1 70B

开源

开放权重

参数

70B

上下文长度

32.768K

模态

Text

架构

Dense

许可证

MIT License

发布日期

27 Dec 2024

训练数据截止日期

技术规格

注意力结构

Multi-Layer Attention

隐藏维度大小

8192

层数

注意力头

112

键值头

112

激活函数

归一化

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

DeepSeek-R1 70B

DeepSeek-R1 is a family of advanced large language models developed by DeepSeek, designed with a primary focus on enhancing reasoning capabilities. The DeepSeek-R1-Distill-Llama-70B variant is a product of knowledge distillation, leveraging the reasoning strengths of the larger DeepSeek-R1 model and transferring them to a Llama-3.3-70B-Instruct base architecture. This distillation process aims to create a highly capable model that maintains the efficiency and operational characteristics of its base while inheriting sophisticated reasoning patterns.

Architecturally, DeepSeek-R1-Distill-Llama-70B is a dense transformer model, distinguishing it from the Mixture of Experts (MoE) architecture of the original DeepSeek-R1. It employs a Multi-Head Attention (MLA) mechanism with 112 attention heads, facilitating comprehensive processing of input sequences. The model integrates Rotary Position Embeddings (RoPE) for effective handling of positional information within sequences and utilizes Flash Attention for optimized computational efficiency. This configuration enables the model to process substantial context lengths, supporting complex problem-solving.

This model is engineered for general text generation, code generation, and sophisticated problem-solving across domains requiring logical inference and multi-step reasoning. Its design prioritizes efficient deployment, making it suitable for applications where computational resources are a consideration, including those on consumer-grade hardware. The DeepSeek-R1-Distill-Llama-70B is particularly adept at tasks demanding structured thought processes, such as mathematical problem-solving and generating coherent code, extending its utility across various technical and research applications.

关于 DeepSeek-R1

DeepSeek-R1 is a model family developed for logical reasoning tasks. It incorporates a Mixture-of-Experts architecture for computational efficiency and scalability. The family utilizes Multi-Head Latent Attention and employs reinforcement learning in its training, with some variants integrating cold-start data.

其他 DeepSeek-R1 模型

评估基准

排名适用于本地LLM。

排名

#17

基准	分数	排名
General Knowledge MMLU	0.80	⭐ 4
Reasoning LiveBench Reasoning	0.60	7
Agentic Coding LiveBench Agentic	0.07	12
Data Analysis LiveBench Data Analysis	0.61	14
Mathematics LiveBench Mathematics	0.59	18
Coding LiveBench Coding	0.47	22

排名

#17

编程排名

#31

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

16k

32k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

DeepSeek-R1 70B

技术规格

系统要求

DeepSeek-R1 70B

关于 DeepSeek-R1

其他 DeepSeek-R1 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源