Parameters
72B
Context Length
131.072K
Modality
Text
Architecture
Dense
License
MIT License
Release Date
16 Jun 2025
Knowledge Cutoff
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
-
Number of Layers
-
Attention Heads
-
Key-Value Heads
-
Activation Function
SwigLU
Normalization
-
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
Kimi-Dev-72B is a specialized large language model developed by Moonshot AI, explicitly designed for advanced software engineering tasks. This 72-billion-parameter model focuses on automating and assisting in the software development lifecycle, encompassing capabilities such as bug fixing, code generation, and the creation of unit tests. Its primary objective is to enhance developer productivity by streamlining repetitive coding tasks and improving the reliability of generated code. The model accepts natural language prompts and coding-related queries through a standard chat interface.
The model's architecture is transformer-based, building upon the Qwen 2.5-72B foundational model. Its optimization involves large-scale reinforcement learning (RL), employing a dataset of approximately 150 billion tokens derived from high-quality, real-world data, including GitHub issues and pull request commits. A notable innovation in its design is the "BugFixer" and "TestWriter" duo, which facilitates a two-stage process: initial file localization followed by precise code editing. The training methodology emphasizes outcome-based rewards, where the model is rewarded only upon successful resolution of issues that pass comprehensive test suites within Docker environments. This approach ensures the generation of robust and verifiable solutions. Furthermore, Kimi-Dev-72B incorporates a test-time self-play mechanism to iteratively refine its outputs.
Kimi-Dev-72B demonstrates proficiency in autonomous code patching within Docker environments, verifying solutions against complete test suites. This characteristic makes it suitable for integration into continuous integration and continuous deployment (CI/CD) pipelines and other production-oriented development workflows. Its use cases extend to automated code review, the implementation of new features, and the generation of technical documentation. The model is capable of producing well-structured, functional code that adheres to established best practices, including the inclusion of type hints and docstrings. Kimi-Dev-72B is available for download and deployment on Hugging Face and GitHub.
Moonshot AI's Kimi model family, exemplified by Kimi K2, employs a Mixture-of-Experts architecture with one trillion total parameters. Designed for natural language generation and agentic capabilities, it features a 128K token context window. The models are open-weight and optimized with the Muon optimizer for stable training.
Ranking is for Local LLMs.
No evaluation benchmarks for Kimi-Dev-72B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens