ApX logoApX logo

Codestral 25.01

Parameters

-

Context Length

32K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

15 Jan 2025

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

48

Key-Value Heads

8

Attention Head Dimension

128

Position Embedding

Absolute Position Embedding

RoPE Theta

1,000,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

6,144

Number of Layers

56

FFN Intermediate Size (Dense)

16,384

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

32,768

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 6.1k · Context: 32K · Vocab: 32.8kx 56 layersRMSNormPre-AttentionMulti-Head Attention48Q / 8KV headsHead dim: 128+RMSNormPre-FFNFeed-Forward NetworkSwiGLUIntermediate: 16.4k+Final RMSNormOutput Logits

Codestral 25.01

Codestral 25.01 is Mistral AI's specialized coding model with deep understanding of software development. Features enhanced capabilities for code generation, completion, debugging, and refactoring across multiple programming languages. Trained on diverse codebases with focus on modern development practices, design patterns, and code quality. Excels at understanding developer intent and generating idiomatic, well-structured code. January 2025 release brings improved accuracy and expanded language support.

About Codestral

Codestral is a Mistral AI model designed for code generation and comprehension. It supports over 80 programming languages. The model family includes a 22 billion parameter variant.


Other Codestral Models
  • No related models available

Evaluation Benchmarks

Rank

#137

BenchmarkScoreRank

0.11

34

Rankings

Overall Rank

#137

Coding Rank

#129

Model Integrity

Total Score

D+

43 / 100

Codestral 25.01 Model Integrity Report

Total Score

43

/ 100

D+

Audit Note

Codestral 25.01 demonstrates a transparency profile typical of 'open-weights' models that prioritize performance over disclosure. While it provides strong benchmark claims and clear versioning, it suffers from critical gaps in data provenance, compute transparency, and architectural specifics. The restrictive non-commercial license further limits its standing as a truly transparent open-source project.

Upstream

13.0 / 30

Architectural Provenance

5.0 / 10

Mistral AI identifies Codestral 25.01 as an evolution of the previous 22B model with a 'more efficient architecture' and an improved tokenizer. However, specific architectural modifications beyond the context window expansion (to 256k) and the 'dense' nature of the model are not publicly documented in a technical paper or detailed model card. The lack of a formal technical report for this specific version leaves the exact training methodology and architectural changes opaque.

Dataset Composition

2.0 / 10

Information regarding the training data is extremely limited. Mistral AI only provides vague marketing claims stating the model was trained on 'diverse codebases' and supports '80+ programming languages.' There is no public disclosure of specific data sources, dataset proportions, filtering techniques, or cleaning methodologies. The absence of a data provenance report or sample data availability results in a low score.

Tokenizer Integrity

6.0 / 10

Mistral mentions an 'improved tokenizer' that contributes to a 2x speed increase compared to the previous version. While the tokenizer is accessible via the 'mistral-common' library and integrated into platforms like Hugging Face, detailed documentation on the vocabulary size or the specific training data alignment for this version is not explicitly provided in the primary release notes.

Model

15.0 / 40

Parameter Density

3.0 / 10

Despite being a major release, Mistral AI has not officially disclosed the exact parameter count for Codestral 25.01. While third-party sources and previous versions suggest it is in the 'sub-100B' or '22B-24B' range, the official documentation remains silent on the specific density. This lack of clarity on a fundamental model metric is a significant transparency gap.

Training Compute

0.0 / 10

There is zero public information regarding the compute resources used to train Codestral 25.01. No disclosure of GPU/TPU hours, hardware specifications, training duration, or environmental impact (carbon footprint) has been made available by Mistral AI.

Benchmark Reproducibility

4.0 / 10

Mistral provides a range of benchmark results (HumanEval, MBPP, CruxEval, etc.) in their announcement blog post. However, they do not release the exact evaluation code, specific prompts, or few-shot examples required to reproduce these results exactly. While the model is tracked on public leaderboards like LMSYS Copilot Arena, the internal benchmarking methodology remains largely proprietary.

Identity Consistency

8.0 / 10

The model consistently identifies itself as a Mistral AI product and is correctly versioned as '25.01' in API endpoints and documentation. It maintains a clear identity as a specialized coding model and does not exhibit the identity confusion seen in some other models that claim to be competitors.

Downstream

15.0 / 30

License Clarity

4.0 / 10

The licensing for Codestral 25.01 is restrictive. While the weights are available for download, they are governed by the 'Mistral AI Non-Production License,' which prohibits commercial use without a separate agreement. This 'open-weights but not open-source' approach creates ambiguity for developers, as it does not meet the standard definition of Open Source (e.g., Apache 2.0).

Hardware Footprint

6.0 / 10

Basic guidance on running the model is available through community integrations (like Ollama and Unsloth), and VRAM requirements for various quantization levels are documented by third-party tools. However, Mistral's official documentation lacks a comprehensive hardware requirement guide, particularly regarding memory scaling at the maximum 256k context length.

Versioning Drift

5.0 / 10

Mistral uses a clear date-based versioning system (25.01), which is an improvement over vague naming conventions. However, there is no public changelog or detailed documentation of behavioral drift or 'alignment tax' between this version and its predecessor. Updates to the 'codestral-latest' API alias can lead to silent behavior changes for users not pinning specific versions.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
16k
31k

VRAM Required:

Recommended GPUs

Codestral 25.01: Specifications and GPU VRAM Requirements