ApX 标志

趋近智

DeepSeek-R1 7B

参数

7B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

27 Dec 2024

知识截止

-

技术规格

注意力结构

Multi-Layer Attention

隐藏维度大小

4096

层数

32

注意力头

64

键值头

64

激活函数

-

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

DeepSeek-R1 7B

DeepSeek-R1-Distill-Qwen-7B is a 7-billion parameter language model engineered by DeepSeek AI. This model variant is a dense architecture, derived through a knowledge distillation process from the larger DeepSeek-R1 system. Its primary design objective is to deliver robust reasoning capabilities, specializing in domains such as mathematical reasoning, logical analysis, and the generation of code. The distillation methodology enables this model to encapsulate advanced problem-solving proficiencies within a more computationally efficient format, making it suitable for deployment in scenarios where resource constraints necessitate a smaller footprint without significant degradation in reasoning performance.

The architectural foundation of DeepSeek-R1-Distill-Qwen-7B is based on the Qwen2.5-Math-7B model. The training regimen for this distilled model emphasizes the transfer of sophisticated reasoning behaviors from the DeepSeek-R1 teacher model. This process leverages a substantial dataset comprising approximately 800,000 curated samples. These samples, generated by the higher-capacity DeepSeek-R1, are bifurcated into approximately 600,000 reasoning-focused examples and 200,000 non-reasoning examples, facilitating a targeted transfer of cognitive patterns. The model employs Multi-Head Latent Attention (MLA) and integrates Rotary Position Embeddings (RoPE) for positional encoding, with context extension techniques such as YaRN used to scale its operational context.

In terms of practical application, DeepSeek-R1-Distill-Qwen-7B is configured to support extended contextual understanding, processing input sequences up to 131,072 tokens. This expanded context window enhances its capacity for handling complex, multi-step problems that necessitate a broad understanding of the input. The model is positioned for use in a variety of technical applications requiring analytical precision, including automated theorem proving, complex algorithmic problem-solving, and advanced programming assistance. Its compact design, coupled with its specialized reasoning aptitude, makes it a viable candidate for integration into systems requiring localized inference or deployment on consumer-grade hardware.

关于 DeepSeek-R1

DeepSeek-R1 is a model family developed for logical reasoning tasks. It incorporates a Mixture-of-Experts architecture for computational efficiency and scalability. The family utilizes Multi-Head Latent Attention and employs reinforcement learning in its training, with some variants integrating cold-start data.


其他 DeepSeek-R1 模型

评估基准

排名适用于本地LLM。

没有可用的 DeepSeek-R1 7B 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU

DeepSeek-R1 7B: Specifications and GPU VRAM Requirements