Kimi K2-Base: Specifications and GPU VRAM Requirements

Kimi K2-Base

开源

开放权重

活跃参数

上下文长度

128K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Modified MIT License

发布日期

11 Jul 2025

训练数据截止日期

技术规格

专家参数总数

32.0B

专家数量

384

活跃专家

注意力结构

Multi-Layer Attention

隐藏维度大小

7168

层数

注意力头

键值头

激活函数

SwigLU

归一化

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Kimi K2-Base

Kimi K2-Base is a foundational large language model developed by Moonshot AI, designed for researchers and developers who require a customizable base for specific applications. It is engineered to facilitate agentic tasks, encompassing advanced code generation, multi-step problem-solving, and the autonomous utilization of external tools and APIs. This model provides a robust platform for developing tailored AI systems across diverse domains, such as legal analysis, scientific research, and specialized conversational interfaces.

Architecturally, Kimi K2-Base is a Mixture-of-Experts (MoE) transformer model. It comprises a total of 1 trillion parameters, with 32 billion parameters activated during each inference. The architecture integrates 384 specialized experts, with 8 experts dynamically selected per token to process inputs. A key innovation in its development is the MuonClip optimizer, proprietary to Moonshot AI, which addresses training instability in large-scale models by mitigating exploding attention logits. The model's internal structure includes 61 layers, an attention hidden dimension of 7168, and employs 64 attention heads along with SwiGLU activation functions.

The Kimi K2-Base model supports a substantial context window of 128,000 tokens, allowing it to process and analyze extended inputs and multi-turn interactions effectively. This design contributes to its efficiency in inference and makes it suitable for applications requiring extensive contextual understanding. Its optimization for agentic intelligence signifies its capability to interpret goals and execute complex workflows without continuous human intervention. The model was pre-trained on an extensive dataset of 15.5 trillion tokens, supporting its performance across various knowledge, reasoning, and coding tasks.

关于 Kimi K2

Moonshot AI's Kimi K2 is a Mixture-of-Experts model featuring one trillion total parameters, activating 32 billion per token. Designed for agentic intelligence, it utilizes a sparse architecture with 384 experts and the MuonClip optimizer for training stability, supporting a 128K token context window.

其他 Kimi K2 模型

评估基准

排名适用于本地LLM。

排名

#15

基准	分数	排名
QA Assistant ProLLM QA Assistant	0.98	🥇 1
Summarization ProLLM Summarization	0.93	🥈 2
StackUnseen ProLLM Stack Unseen	0.71	4
Graduate-Level QA GPQA	0.48	16
Professional Knowledge MMLU Pro	0.69	23
General Knowledge MMLU	0.48	27

排名

#15

编程排名

#14

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

63k

125k

所需显存:

资源

官方文档下载权重

Kimi K2-Base

技术规格

系统要求

Kimi K2-Base

关于 Kimi K2

其他 Kimi K2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源