DeepSeek-R1 3B: Specifications and GPU VRAM Requirements

DeepSeek-R1 3B

闭源

开放权重

参数

上下文长度

32.768K

模态

Text

架构

Dense

许可证

Llama 3.2 Community License

发布日期

27 Dec 2024

知识截止

技术规格

注意力结构

Multi-Layer Attention

隐藏维度大小

3072

层数

注意力头

键值头

激活函数

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

DeepSeek-R1 3B

DeepSeek-R1 3B is a compact, dense language model variant developed through a distillation process from the larger DeepSeek-R1 architecture. This model is specifically built upon the Llama 3.2-3B foundational architecture, aiming to retain robust reasoning capabilities while significantly reducing computational resource requirements. Its design integrates a specialized chat templating system, ensuring compatibility with Llama 3 formatting, alongside custom tokenization to facilitate structured output and enhanced reasoning pathways.

The development methodology for DeepSeek-R1 3B incorporates several technical optimizations crucial for efficient training and inference. These include the application of LoRA (Low-Rank Adaptation) for fine-tuning, leveraging Flash Attention for accelerated self-attention computations, and utilizing gradient checkpointing to manage memory consumption during training. This architectural synthesis enables the model to process information with efficiency, making it suitable for deployment in environments where computational resources are a constraint.

The primary use cases for DeepSeek-R1 3B center on applications that demand structured reasoning and general language understanding, such as mathematical problem-solving or comparative analysis tasks. Its distilled nature allows it to deliver performance suitable for practical applications requiring a balance of reasoning fidelity and operational efficiency.

关于 DeepSeek-R1

DeepSeek-R1 is a model family developed for logical reasoning tasks. It incorporates a Mixture-of-Experts architecture for computational efficiency and scalability. The family utilizes Multi-Head Latent Attention and employs reinforcement learning in its training, with some variants integrating cold-start data.

其他 DeepSeek-R1 模型

评估基准

排名适用于本地LLM。

没有可用的 DeepSeek-R1 3B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

16k

32k

所需显存:

资源

官方文档阅读论文下载权重源代码

DeepSeek-R1 3B

技术规格

系统要求

DeepSeek-R1 3B

关于 DeepSeek-R1

其他 DeepSeek-R1 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源