DeepSeek-R1 8B: Specifications and GPU VRAM Requirements

DeepSeek-R1 8B

开源

开放权重

参数

上下文长度

64K

模态

Text

架构

Dense

许可证

MIT License

发布日期

27 Dec 2024

训练数据截止日期

技术规格

注意力结构

Multi-Layer Attention

隐藏维度大小

4096

层数

注意力头

键值头

激活函数

归一化

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

DeepSeek-R1 8B

DeepSeek-R1 is a family of models developed with a focus on enhancing reasoning capabilities in large language models. The foundational DeepSeek-R1-Zero model was innovated through large-scale reinforcement learning (RL) without an initial supervised fine-tuning (SFT) phase, demonstrating an emergent capacity for complex reasoning. Building upon this, the DeepSeek-R1 model refines these capabilities by incorporating multi-stage training and cold-start data prior to the RL phase, addressing initial challenges related to output readability and coherence.

The 8B variant, specifically exemplified by DeepSeek-R1-Distill-Llama-8B or DeepSeek-R1-0528-Qwen3-8B, represents a significant contribution to the field of efficient model deployment. These models are dense architectures that leverage a distillation process. This involves fine-tuning smaller, open-source base models, such as Llama or Qwen series, with high-quality reasoning data generated by the larger DeepSeek-R1 model. The objective of this distillation is to transfer the sophisticated reasoning patterns of the larger model into a more compact form, enabling the 8B variant to perform effectively in environments with constrained computational resources while maintaining strong performance in domains requiring intricate logical inference.

The DeepSeek-R1-0528 update, applied to the 8B distilled model, further refines its reasoning and inference capabilities through computational enhancements and algorithmic optimizations in the post-training phase. This iteration demonstrates improved depth of thought, reduced instances of hallucination, and enhanced support for function calling. The DeepSeek-R1 8B models are applicable across various technical use cases, including advanced AI research, automated code generation, mathematical problem-solving, and general natural language processing tasks that demand robust logical deduction.

关于 DeepSeek-R1

DeepSeek-R1 is a model family developed for logical reasoning tasks. It incorporates a Mixture-of-Experts architecture for computational efficiency and scalability. The family utilizes Multi-Head Latent Attention and employs reinforcement learning in its training, with some variants integrating cold-start data.

其他 DeepSeek-R1 模型

评估基准

排名适用于本地LLM。

没有可用的 DeepSeek-R1 8B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

31k

63k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

DeepSeek-R1 8B

技术规格

系统要求

DeepSeek-R1 8B

关于 DeepSeek-R1

其他 DeepSeek-R1 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源