Parameters
72B
Context Length
131.072K
Modality
Text
Architecture
Dense
License
MIT License
Release Date
16 Jun 2025
Knowledge Cutoff
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
-
Number of Layers
-
Attention Heads
-
Key-Value Heads
-
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
Kimi-Dev-72B is a specialized large language model developed by Moonshot AI, engineered specifically for autonomous software engineering and complex issue resolution. Built upon the Qwen2.5-72B foundational architecture, the model undergoes a sophisticated multi-stage training process designed to instill structured skill priors for software development tasks. This process includes a large-scale mid-training phase using approximately 150 billion tokens of high-quality, real-world data from GitHub issues and pull request commits, enabling the model to internalize the reasoning patterns and technical workflows employed by human developers. Unlike general-purpose coding assistants, Kimi-Dev-72B is optimized to function as an autonomous agent capable of localized file identification and precise code editing.
The model's core innovation lies in its duo-stage framework, comprising specialized "BugFixer" and "TestWriter" behaviors. This architecture facilitates a two-step operational cycle: first, the model identifies the relevant files within a repository (File Localization), and second, it generates the necessary code modifications or unit tests (Code Edits). The training methodology leverages large-scale reinforcement learning (RL) with outcome-based rewards, where the model receives positive reinforcement only when its proposed patches successfully pass an entire test suite within a containerized Docker environment. This rigorous verification loop ensures that the generated solutions are functionally correct and adhere to production-grade standards.
Kimi-Dev-72B is designed for seamless integration into modern software development lifecycles, supporting tasks such as automated bug fixing, unit test generation, and comprehensive code reviews. By employing a test-time self-play mechanism, the model iteratively refines its outputs, making it highly effective for resolving complex issues in large-scale codebases. Its dense 72-billion-parameter architecture provides a robust balance between reasoning capability and computational efficiency, while its 131,072-token context window allows it to maintain a deep understanding of extensive project structures and cross-file dependencies. The model is released under the MIT license, providing the community with open access to its weights and source code for further research and development.
Moonshot AI's Kimi model family, exemplified by Kimi K2, employs a Mixture-of-Experts architecture with one trillion total parameters. Designed for natural language generation and agentic capabilities, it features a 128K token context window. The models are open-weight and optimized with the Muon optimizer for stable training.
No evaluation benchmarks for Kimi-Dev-72B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens