DeepSeek-V3.1: Specifications and GPU VRAM Requirements

DeepSeek-V3.1

闭源

开放权重

活跃参数

671B

上下文长度

128K

模态

Text

架构

Mixture of Experts (MoE)

许可证

MIT License

发布日期

21 Aug 2025

知识截止

技术规格

专家参数总数

37.0B

专家数量

257

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

7168

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

DeepSeek-V3.1

A hybrid model that supports both "thinking" and "non-thinking" modes for chat, reasoning, and coding. It's a Mixture-of-Experts (MoE) model with a massive context length and efficient architecture.

关于 DeepSeek-V3

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model comprising 671B parameters with 37B activated per token. Its architecture incorporates Multi-head Latent Attention and DeepSeekMoE for efficient inference and training. Innovations include an auxiliary-loss-free load balancing strategy and a multi-token prediction objective, trained on 14.8T tokens.

其他 DeepSeek-V3 模型

DeepSeek-V3 671B

评估基准

排名适用于本地LLM。

排名

基准	分数	排名
General Knowledge MMLU	0.94	🥇 1
Coding Aider Coding	0.76	🥈 2
Professional Knowledge MMLU Pro	0.85	🥈 2
Graduate-Level QA GPQA	0.80	🥈 2

排名

#3 🥉

编程排名

#10

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

63k

125k

所需显存:

资源

下载权重

DeepSeek-V3.1

技术规格

系统要求

DeepSeek-V3.1

关于 DeepSeek-V3

其他 DeepSeek-V3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源