ApX 标志

趋近智

DeepSeek-R1 70B

参数

70B

上下文长度

32.768K

模态

Text

架构

Dense

许可证

MIT License

发布日期

27 Dec 2024

知识截止

-

技术规格

注意力结构

Multi-Layer Attention

隐藏维度大小

8192

层数

80

注意力头

112

键值头

112

激活函数

-

归一化

-

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

DeepSeek-R1 70B

DeepSeek-R1 is a family of advanced large language models developed by DeepSeek, designed with a primary focus on enhancing reasoning capabilities. The DeepSeek-R1-Distill-Llama-70B variant is a product of knowledge distillation, leveraging the reasoning strengths of the larger DeepSeek-R1 model and transferring them to a Llama-3.3-70B-Instruct base architecture. This distillation process aims to create a highly capable model that maintains the efficiency and operational characteristics of its base while inheriting sophisticated reasoning patterns.

Architecturally, DeepSeek-R1-Distill-Llama-70B is a dense transformer model, distinguishing it from the Mixture of Experts (MoE) architecture of the original DeepSeek-R1. It employs a Multi-Head Attention (MLA) mechanism with 112 attention heads, facilitating comprehensive processing of input sequences. The model integrates Rotary Position Embeddings (RoPE) for effective handling of positional information within sequences and utilizes Flash Attention for optimized computational efficiency. This configuration enables the model to process substantial context lengths, supporting complex problem-solving.

This model is engineered for general text generation, code generation, and sophisticated problem-solving across domains requiring logical inference and multi-step reasoning. Its design prioritizes efficient deployment, making it suitable for applications where computational resources are a consideration, including those on consumer-grade hardware. The DeepSeek-R1-Distill-Llama-70B is particularly adept at tasks demanding structured thought processes, such as mathematical problem-solving and generating coherent code, extending its utility across various technical and research applications.

关于 DeepSeek-R1

DeepSeek-R1 is a model family developed for logical reasoning tasks. It incorporates a Mixture-of-Experts architecture for computational efficiency and scalability. The family utilizes Multi-Head Latent Attention and employs reinforcement learning in its training, with some variants integrating cold-start data.


其他 DeepSeek-R1 模型

评估基准

排名适用于本地LLM。

排名

#24

基准分数排名

0.60

11

0.61

12

Agentic Coding

LiveBench Agentic

0.07

13

0.59

16

0.47

21

排名

排名

#24

编程排名

#28

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
16k
32k

所需显存:

推荐 GPU