Active Parameters
1T
Context Length
512K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Modified MIT License
Release Date
5 Feb 2026
Knowledge Cutoff
-
Total Expert Parameters
-
Number of Experts
-
Active Experts
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
-
Number of Layers
-
Attention Heads
-
Key-Value Heads
-
Activation Function
-
Normalization
-
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
Kimi K2.5 is the latest long-context language model from Moonshot AI, released in early 2026. Built on a massive 1 trillion parameter MoE architecture, it supports context windows of up to 512,000 tokens. The model demonstrates exceptional performance in multimodal understanding and large-scale data synthesis.
Moonshot AI's Kimi K2 is a Mixture-of-Experts model featuring one trillion total parameters, activating 32 billion per token. Designed for agentic intelligence, it utilizes a sparse architecture with 384 experts and the MuonClip optimizer for training stability, supporting a 128K token context window.
No evaluation benchmarks for Kimi K2.5 available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens