Active Parameters
1T
Context Length
256K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Modified MIT License
Release Date
7 Nov 2025
Knowledge Cutoff
-
Total Expert Parameters
32.0B
Number of Experts
384
Active Experts
8
Attention Structure
Multi-Head Attention
Hidden Dimension Size
7168
Number of Layers
61
Attention Heads
64
Key-Value Heads
-
Activation Function
SwigLU
Normalization
-
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
Kimi K2 Thinking is a language model developed by Moonshot AI, engineered as a specialized thinking agent designed to perform complex, multi-step reasoning and dynamic tool invocation. The model is trained to interleave chain-of-thought processes with function calls, enabling it to execute intricate workflows such as autonomous research, coding, and writing that can persist over hundreds of sequential actions without coherence degradation. A key design principle is its native INT4 quantization, which is applied via Quantization-Aware Training (QAT) to achieve efficient inference, contributing to lossless reductions in inference latency and GPU memory utilization.
Architecturally, Kimi K2 Thinking operates on a sparse Mixture-of-Experts (MoE) paradigm, encompassing a total of 1 trillion parameters, with 32 billion parameters activated per inference pass. The model's internal structure includes 61 layers and employs a Multi-Head Latent Attention (MLA) mechanism with 64 attention heads. The activation function utilized is SwiGLU, and it features a vocabulary size of 160,000 tokens. It incorporates 384 experts, selecting 8 experts per token during processing, and is optimized for persistent step-by-step reasoning within its architectural constraints.
The model is characterized by a substantial 256,000-token context window, allowing for the processing of extensive textual inputs, which is particularly beneficial for long-horizon tasks, complex debugging, or comprehensive document analysis. This extended context, combined with its robust tool orchestration capabilities, enables Kimi K2 Thinking to maintain stable goal-directed behavior across 200 to 300 consecutive tool invocations. This capacity addresses a common limitation in prior models, which often exhibit performance degradation after a significantly fewer number of sequential steps.
Moonshot AI's Kimi K2 is a Mixture-of-Experts model featuring one trillion total parameters, activating 32 billion per token. Designed for agentic intelligence, it utilizes a sparse architecture with 384 experts and the MuonClip optimizer for training stability, supporting a 128K token context window.
Ranking is for Local LLMs.
No evaluation benchmarks for Kimi K2 Thinking available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens