Kimi K2 Thinking: Specifications and GPU VRAM Requirements

Kimi K2 Thinking

闭源

开放权重

活跃参数

上下文长度

256K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Modified MIT License

发布日期

7 Nov 2025

训练数据截止日期

技术规格

专家参数总数

32.0B

专家数量

384

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

7168

层数

注意力头

键值头

激活函数

SwigLU

归一化

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Kimi K2 Thinking

Kimi K2 Thinking is a language model developed by Moonshot AI, engineered as a specialized thinking agent designed to perform complex, multi-step reasoning and dynamic tool invocation. The model is trained to interleave chain-of-thought processes with function calls, enabling it to execute intricate workflows such as autonomous research, coding, and writing that can persist over hundreds of sequential actions without coherence degradation. A key design principle is its native INT4 quantization, which is applied via Quantization-Aware Training (QAT) to achieve efficient inference, contributing to lossless reductions in inference latency and GPU memory utilization.

Architecturally, Kimi K2 Thinking operates on a sparse Mixture-of-Experts (MoE) paradigm, encompassing a total of 1 trillion parameters, with 32 billion parameters activated per inference pass. The model's internal structure includes 61 layers and employs a Multi-Head Latent Attention (MLA) mechanism with 64 attention heads. The activation function utilized is SwiGLU, and it features a vocabulary size of 160,000 tokens. It incorporates 384 experts, selecting 8 experts per token during processing, and is optimized for persistent step-by-step reasoning within its architectural constraints.

The model is characterized by a substantial 256,000-token context window, allowing for the processing of extensive textual inputs, which is particularly beneficial for long-horizon tasks, complex debugging, or comprehensive document analysis. This extended context, combined with its robust tool orchestration capabilities, enables Kimi K2 Thinking to maintain stable goal-directed behavior across 200 to 300 consecutive tool invocations. This capacity addresses a common limitation in prior models, which often exhibit performance degradation after a significantly fewer number of sequential steps.

关于 Kimi K2

Moonshot AI's Kimi K2 is a Mixture-of-Experts model featuring one trillion total parameters, activating 32 billion per token. Designed for agentic intelligence, it utilizes a sparse architecture with 384 experts and the MuonClip optimizer for training stability, supporting a 128K token context window.

其他 Kimi K2 模型

评估基准

排名适用于本地LLM。

排名

基准	分数	排名
Agentic Coding LiveBench Agentic	0.38	🥇 1
Mathematics LiveBench Mathematics	0.91	🥈 2
StackUnseen ProLLM Stack Unseen	0.76	🥉 3
Data Analysis LiveBench Data Analysis	0.71	⭐ 4
Reasoning LiveBench Reasoning	0.65	5
Coding LiveBench Coding	0.67	7

排名

#2 🥈

编程排名

#3 🥉

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

125k

250k

所需显存:

资源

官方文档发布说明下载权重

Kimi K2 Thinking

技术规格

系统要求

Kimi K2 Thinking

关于 Kimi K2

其他 Kimi K2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源