趋近智
活跃参数
671B
上下文长度
128K
模态
Text
架构
Mixture of Experts (MoE)
许可证
MIT
发布日期
10 Jan 2026
训练数据截止日期
-
专家参数总数
37.0B
专家数量
-
活跃专家
-
注意力结构
Multi-Head Attention
隐藏维度大小
-
层数
-
注意力头
-
键值头
-
激活函数
-
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
DeepSeek-V3.2 is a powerful open-source Mixture-of-Experts (MoE) language model with 671B total parameters and 37B activated parameters per token. Built with an innovative architecture combining Multi-head Latent Attention (MLA) and DeepSeekMoE for efficient inference. Achieves exceptional performance across multiple benchmarks: 90.2% on MMLU-Pro, 84.5% on GPQA Diamond, 91.6% on MATH-500, 78.1% on Codeforces, and 92.3% on HumanEval. Supports 128k context window with strong multilingual capabilities. Features superior coding abilities, advanced mathematical reasoning, and competitive performance with leading closed-source models. Trained on 14.8 trillion diverse, high-quality tokens. MIT licensed for both research and commercial use. Ideal for complex reasoning, code generation, mathematical problem-solving, and general-purpose language understanding tasks.
DeepSeek-V3 is a Mixture-of-Experts (MoE) language model comprising 671B parameters with 37B activated per token. Its architecture incorporates Multi-head Latent Attention and DeepSeekMoE for efficient inference and training. Innovations include an auxiliary-loss-free load balancing strategy and a multi-token prediction objective, trained on 14.8T tokens.
排名
#38
| 基准 | 分数 | 排名 |
|---|---|---|
Coding Aider Coding | 0.74 | 7 |
Agentic Coding LiveBench Agentic | 0.47 | 14 |
Coding LiveBench Coding | 0.76 | 15 |
Data Analysis LiveBench Data Analysis | 0.67 | 34 |
Reasoning LiveBench Reasoning | 0.46 | 37 |
Mathematics LiveBench Mathematics | 0.64 | 40 |
Graduate-Level QA GPQA | 0.8 | 50 |