Parameters
-
Context Length
32K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
15 Jan 2025
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Head Attention
Attention Heads
48
Key-Value Heads
8
Attention Head Dimension
128
Position Embedding
Absolute Position Embedding
RoPE Theta
1,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
6,144
Number of Layers
56
FFN Intermediate Size (Dense)
16,384
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
32,768
Codestral 25.01 is Mistral AI's specialized coding model with deep understanding of software development. Features enhanced capabilities for code generation, completion, debugging, and refactoring across multiple programming languages. Trained on diverse codebases with focus on modern development practices, design patterns, and code quality. Excels at understanding developer intent and generating idiomatic, well-structured code. January 2025 release brings improved accuracy and expanded language support.
Codestral is a Mistral AI model designed for code generation and comprehension. It supports over 80 programming languages. The model family includes a 22 billion parameter variant.
Rank
#137
| Benchmark | Score | Rank |
|---|---|---|
Coding Aider Coding | 0.11 | 34 |
Overall Rank
#137
Coding Rank
#129
Total Score
43
/ 100
Codestral 25.01 demonstrates a transparency profile typical of 'open-weights' models that prioritize performance over disclosure. While it provides strong benchmark claims and clear versioning, it suffers from critical gaps in data provenance, compute transparency, and architectural specifics. The restrictive non-commercial license further limits its standing as a truly transparent open-source project.
Architectural Provenance
Mistral AI identifies Codestral 25.01 as an evolution of the previous 22B model with a 'more efficient architecture' and an improved tokenizer. However, specific architectural modifications beyond the context window expansion (to 256k) and the 'dense' nature of the model are not publicly documented in a technical paper or detailed model card. The lack of a formal technical report for this specific version leaves the exact training methodology and architectural changes opaque.
Dataset Composition
Information regarding the training data is extremely limited. Mistral AI only provides vague marketing claims stating the model was trained on 'diverse codebases' and supports '80+ programming languages.' There is no public disclosure of specific data sources, dataset proportions, filtering techniques, or cleaning methodologies. The absence of a data provenance report or sample data availability results in a low score.
Tokenizer Integrity
Mistral mentions an 'improved tokenizer' that contributes to a 2x speed increase compared to the previous version. While the tokenizer is accessible via the 'mistral-common' library and integrated into platforms like Hugging Face, detailed documentation on the vocabulary size or the specific training data alignment for this version is not explicitly provided in the primary release notes.
Parameter Density
Despite being a major release, Mistral AI has not officially disclosed the exact parameter count for Codestral 25.01. While third-party sources and previous versions suggest it is in the 'sub-100B' or '22B-24B' range, the official documentation remains silent on the specific density. This lack of clarity on a fundamental model metric is a significant transparency gap.
Training Compute
There is zero public information regarding the compute resources used to train Codestral 25.01. No disclosure of GPU/TPU hours, hardware specifications, training duration, or environmental impact (carbon footprint) has been made available by Mistral AI.
Benchmark Reproducibility
Mistral provides a range of benchmark results (HumanEval, MBPP, CruxEval, etc.) in their announcement blog post. However, they do not release the exact evaluation code, specific prompts, or few-shot examples required to reproduce these results exactly. While the model is tracked on public leaderboards like LMSYS Copilot Arena, the internal benchmarking methodology remains largely proprietary.
Identity Consistency
The model consistently identifies itself as a Mistral AI product and is correctly versioned as '25.01' in API endpoints and documentation. It maintains a clear identity as a specialized coding model and does not exhibit the identity confusion seen in some other models that claim to be competitors.
License Clarity
The licensing for Codestral 25.01 is restrictive. While the weights are available for download, they are governed by the 'Mistral AI Non-Production License,' which prohibits commercial use without a separate agreement. This 'open-weights but not open-source' approach creates ambiguity for developers, as it does not meet the standard definition of Open Source (e.g., Apache 2.0).
Hardware Footprint
Basic guidance on running the model is available through community integrations (like Ollama and Unsloth), and VRAM requirements for various quantization levels are documented by third-party tools. However, Mistral's official documentation lacks a comprehensive hardware requirement guide, particularly regarding memory scaling at the maximum 256k context length.
Versioning Drift
Mistral uses a clear date-based versioning system (25.01), which is an improvement over vague naming conventions. However, there is no public changelog or detailed documentation of behavioral drift or 'alignment tax' between this version and its predecessor. Updates to the 'codestral-latest' API alias can lead to silent behavior changes for users not pinning specific versions.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online