Kimi K2-Instruct: Specifications and GPU VRAM Requirements

Kimi K2-Instruct

Open Source

Open Weights

Active Parameters

Context Length

128K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Modified MIT License

Release Date

11 Jul 2025

Knowledge Cutoff

Technical Specifications

Total Expert Parameters

32.0B

Number of Experts

384

Active Experts

Attention Structure

Multi-Layer Attention

Hidden Dimension Size

7168

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Kimi K2-Instruct

Kimi K2-Instruct is an advanced Mixture-of-Experts (MoE) language model developed by Moonshot AI. This model incorporates 1 trillion total parameters, with approximately 32 billion parameters activated during each inference pass. Its core purpose is to deliver state-of-the-art agentic intelligence, facilitating sophisticated tool utilization, advanced code generation, and autonomous problem-solving across various domains. As a post-trained instruction-following variant, Kimi K2-Instruct is optimized for general-purpose conversational tasks and complex agentic workflows, operating as a reflex-grade model designed for direct application.

The architectural design of Kimi K2-Instruct features a Mixture-of-Experts paradigm, leveraging 384 specialized experts, with 8 active experts dynamically selected per token during inference. The model comprises 61 layers and employs a Multi-head Local Attention (MLA) mechanism with 64 attention heads. A key innovation in its training methodology is the MuonClip optimizer, developed by Moonshot AI, which ensures training stability at the expansive scale of 15.5 trillion tokens. The architecture prioritizes long-context efficiency, supporting a substantial context window of 128,000 tokens. The activation function employed within the model is SwiGLU, complemented by Rotary Position Embeddings (RoPE).

Kimi K2-Instruct is engineered for demanding applications, including complex, multi-step reasoning tasks and analytical workflows that necessitate profound comprehension. Its capabilities encompass advanced code generation, ranging from foundational scripting to intricate software development and debugging, along with robust support for multilingual applications. The model exhibits strong tool-calling capabilities, enabling it to autonomously interpret user intentions and orchestrate external tools and APIs to accomplish intricate objectives. Practical use cases include automating development workflows, generating comprehensive data analysis reports, and facilitating interactive task planning by seamlessly integrating multiple external services.

About Kimi K2

Moonshot AI's Kimi K2 is a Mixture-of-Experts model featuring one trillion total parameters, activating 32 billion per token. Designed for agentic intelligence, it utilizes a sparse architecture with 384 experts and the MuonClip optimizer for training stability, supporting a 128K token context window.

Other Kimi K2 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

Benchmark	Score	Rank
QA Assistant ProLLM QA Assistant	0.98	🥇 1
Coding LiveBench Coding	0.74	🥈 2
Agentic Coding LiveBench Agentic	0.25	🥉 3
Graduate-Level QA GPQA	0.75	🥉 3
General Knowledge MMLU	0.75	⭐ 6
Web Development WebDev Arena	1315.08	8
Mathematics LiveBench Mathematics	0.75	11
Data Analysis LiveBench Data Analysis	0.63	13
Professional Knowledge MMLU Pro	0.81	14
Reasoning LiveBench Reasoning	0.45	15

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

63k

125k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights Source Code