Kimi-Dev-72B: Specifications and GPU VRAM Requirements

Kimi-Dev-72B

Open Source

Open Weights

Parameters

72B

Context Length

131.072K

Modality

Text

Architecture

Dense

License

MIT License

Release Date

16 Jun 2025

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Kimi-Dev-72B

Kimi-Dev-72B is a specialized large language model developed by Moonshot AI, explicitly designed for advanced software engineering tasks. This 72-billion-parameter model focuses on automating and assisting in the software development lifecycle, encompassing capabilities such as bug fixing, code generation, and the creation of unit tests. Its primary objective is to enhance developer productivity by streamlining repetitive coding tasks and improving the reliability of generated code. The model accepts natural language prompts and coding-related queries through a standard chat interface.

The model's architecture is transformer-based, building upon the Qwen 2.5-72B foundational model. Its optimization involves large-scale reinforcement learning (RL), employing a dataset of approximately 150 billion tokens derived from high-quality, real-world data, including GitHub issues and pull request commits. A notable innovation in its design is the "BugFixer" and "TestWriter" duo, which facilitates a two-stage process: initial file localization followed by precise code editing. The training methodology emphasizes outcome-based rewards, where the model is rewarded only upon successful resolution of issues that pass comprehensive test suites within Docker environments. This approach ensures the generation of robust and verifiable solutions. Furthermore, Kimi-Dev-72B incorporates a test-time self-play mechanism to iteratively refine its outputs.

Kimi-Dev-72B demonstrates proficiency in autonomous code patching within Docker environments, verifying solutions against complete test suites. This characteristic makes it suitable for integration into continuous integration and continuous deployment (CI/CD) pipelines and other production-oriented development workflows. Its use cases extend to automated code review, the implementation of new features, and the generation of technical documentation. The model is capable of producing well-structured, functional code that adheres to established best practices, including the inclusion of type hints and docstrings. Kimi-Dev-72B is available for download and deployment on Hugging Face and GitHub.

About Kimi

Moonshot AI's Kimi model family, exemplified by Kimi K2, employs a Mixture-of-Experts architecture with one trillion total parameters. Designed for natural language generation and agentic capabilities, it features a 128K token context window. The models are open-weight and optimized with the Muon optimizer for stable training.

Other Kimi Models

No related models available

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Kimi-Dev-72B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

64k

128k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights Source Code