ApX 标志

趋近智

DeepSeek-R1 14B

参数

14B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

MIT License

发布日期

27 Dec 2024

知识截止

Jul 2024

技术规格

注意力结构

Multi-Layer Attention

隐藏维度大小

5120

层数

40

注意力头

80

键值头

80

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

DeepSeek-R1 14B

DeepSeek-R1-Distill-Qwen-14B is a dense large language model within the DeepSeek-R1 series, engineered for advanced reasoning capabilities. This model is a product of distillation from the formidable 671B DeepSeek-R1 (a Mixture-of-Experts model), with its foundational architecture rooted in the Qwen 2.5 14B model. The primary objective of this distillation process is to efficiently transfer sophisticated reasoning skills, particularly in the domains of mathematics and coding, from the larger DeepSeek-R1 into a more compact and computationally efficient dense model.

The technical architecture of DeepSeek-R1-Distill-Qwen-14B is based on a transformer framework. It incorporates Rotary Position Embeddings (RoPE) for effective positional encoding, utilizes SwiGLU as its activation function, and employs RMSNorm for robust normalization. The attention mechanism includes QKV bias, characteristic of the Qwen 2.5 series from which it is derived. Unlike its larger DeepSeek-R1 progenitor, this variant maintains a dense architecture, optimizing for direct parameter utilization rather than expert sparsity.

This model is designed to support a substantial context length, accommodating up to 131,072 tokens, which facilitates the processing of extensive inputs. Its application extends across various natural language processing tasks, encompassing text generation, data analysis, and the synthesis of code. The model's heritage from DeepSeek-R1 underscores its proficiency in complex reasoning tasks, making it suitable for mathematical problem-solving and programming. Furthermore, it supports both few-shot and zero-shot learning paradigms and is optimized for local deployment, offering flexibility for integration into diverse applications via an API.

关于 DeepSeek-R1

DeepSeek-R1 is a model family developed for logical reasoning tasks. It incorporates a Mixture-of-Experts architecture for computational efficiency and scalability. The family utilizes Multi-Head Latent Attention and employs reinforcement learning in its training, with some variants integrating cold-start data.


其他 DeepSeek-R1 模型

评估基准

排名适用于本地LLM。

没有可用的 DeepSeek-R1 14B 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU