ApX logo

Kimi K2-Base

Active Parameters

1T

Context Length

128K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Modified MIT License

Release Date

11 Jul 2025

Knowledge Cutoff

-

Technical Specifications

Total Expert Parameters

32.0B

Number of Experts

384

Active Experts

8

Attention Structure

Multi-Layer Attention

Hidden Dimension Size

7168

Number of Layers

61

Attention Heads

64

Key-Value Heads

-

Activation Function

SwigLU

Normalization

-

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Kimi K2-Base

Kimi K2-Base is a foundational large language model developed by Moonshot AI, designed for researchers and developers who require a customizable base for specific applications. It is engineered to facilitate agentic tasks, encompassing advanced code generation, multi-step problem-solving, and the autonomous utilization of external tools and APIs. This model provides a robust platform for developing tailored AI systems across diverse domains, such as legal analysis, scientific research, and specialized conversational interfaces.

Architecturally, Kimi K2-Base is a Mixture-of-Experts (MoE) transformer model. It comprises a total of 1 trillion parameters, with 32 billion parameters activated during each inference. The architecture integrates 384 specialized experts, with 8 experts dynamically selected per token to process inputs. A key innovation in its development is the MuonClip optimizer, proprietary to Moonshot AI, which addresses training instability in large-scale models by mitigating exploding attention logits. The model's internal structure includes 61 layers, an attention hidden dimension of 7168, and employs 64 attention heads along with SwiGLU activation functions.

The Kimi K2-Base model supports a substantial context window of 128,000 tokens, allowing it to process and analyze extended inputs and multi-turn interactions effectively. This design contributes to its efficiency in inference and makes it suitable for applications requiring extensive contextual understanding. Its optimization for agentic intelligence signifies its capability to interpret goals and execute complex workflows without continuous human intervention. The model was pre-trained on an extensive dataset of 15.5 trillion tokens, supporting its performance across various knowledge, reasoning, and coding tasks.

About Kimi K2

Moonshot AI's Kimi K2 is a Mixture-of-Experts model featuring one trillion total parameters, activating 32 billion per token. Designed for agentic intelligence, it utilizes a sparse architecture with 384 experts and the MuonClip optimizer for training stability, supporting a 128K token context window.


Other Kimi K2 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#14

BenchmarkScoreRank

0.93

🥇

1

0.71

🥉

3

Graduate-Level QA

GPQA

0.48

17

General Knowledge

MMLU

0.48

25

Rankings

Overall Rank

#14

Coding Rank

#3 🥉

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs

Kimi K2-Base: Specifications and GPU VRAM Requirements