DeepSeek-R1 1.5B: Specifications and GPU VRAM Requirements

DeepSeek-R1 1.5B

开源

开放权重

参数

1.5B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

MIT

发布日期

27 Dec 2024

训练数据截止日期

技术规格

注意力结构

Multi-Layer Attention

隐藏维度大小

2048

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

DeepSeek-R1 1.5B

DeepSeek-R1 is a family of reasoning-focused large language models developed by DeepSeek AI. The DeepSeek-R1-Distill-Qwen-1.5B variant represents a compact model within this family, specifically engineered to distill the complex reasoning capabilities of larger DeepSeek-R1 models into a more parameter-efficient architecture. This model is fine-tuned using extensive reasoning data generated by the higher-capacity DeepSeek-R1 models. Its primary purpose is to provide advanced language understanding and reasoning abilities in a form factor suitable for deployment in environments with more constrained computational resources.

The DeepSeek-R1-Distill-Qwen-1.5B model is constructed upon a Transformer-based architecture, deriving its foundational structure from the Qwen2.5-Math-1.5B base. This architecture integrates several key components for efficient operation, including Rotary Position Embedding (RoPE) for handling sequence length, the SwiGLU activation function, and RMSNorm for stable training. While the broader DeepSeek-R1 framework employs a Mixture-of-Experts (MoE) design, the 1.5B distilled variant utilizes a dense architecture. Its attention mechanism leverages Grouped Query Attention (GQA), which optimizes the computational efficiency of the attention process by sharing key and value projections across multiple attention heads, thereby reducing memory bandwidth requirements during inference.

This model is designed to facilitate robust performance in tasks demanding logical inference and step-by-step problem-solving. It is particularly applicable to domains such as mathematical problem-solving, code comprehension, and general text-based reasoning. The compact parameter size of the DeepSeek-R1-Distill-Qwen-1.5B model makes it suitable for deployment on standard consumer-grade hardware or edge devices, enabling local execution without extensive computational infrastructure. This characteristic broadens accessibility for researchers and developers seeking to integrate advanced reasoning functionalities into resource-sensitive applications.

关于 DeepSeek-R1

DeepSeek-R1 is a model family developed for logical reasoning tasks. It incorporates a Mixture-of-Experts architecture for computational efficiency and scalability. The family utilizes Multi-Head Latent Attention and employs reinforcement learning in its training, with some variants integrating cold-start data.

其他 DeepSeek-R1 模型

评估基准

排名适用于本地LLM。

没有可用的 DeepSeek-R1 1.5B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

DeepSeek-R1 1.5B

技术规格

系统要求

DeepSeek-R1 1.5B

关于 DeepSeek-R1

其他 DeepSeek-R1 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源