Ministral-3B-2410

Closed Source

Closed Weights

Parameters

Context Length

128K

Modality

Text

Architecture

Dense

License

Mistral Commercial License

Release Date

10 Oct 2024

Knowledge Cutoff

Evaluation Benchmarks

No evaluation benchmarks for Ministral-3B-2410 available.

Rankings

Overall Rank

Coding Rank

About Ministral-3B-2410

Ministral-3B-2410 is a foundational language model developed by Mistral AI, specifically optimized for on-device and edge computing applications. This model is part of the 'les Ministraux' family, designed to provide computationally efficient and low-latency solutions for scenarios demanding local, privacy-first inference. Its compact size enables deployment in resource-constrained environments, including smartphones, tablets, and IoT devices. Ministral-3B-2410 can also function as an intermediary in multi-step agentic workflows, handling tasks such as input parsing, task routing, and API calls, thereby reducing latency and cost when integrated with larger models like Mistral Large.

Architecturally, Ministral-3B-2410 is a dense Transformer model. It integrates advanced attention mechanisms, including Grouped Query Attention (GQA), to enhance processing speed and manage memory overhead. The model supports a context length of up to 128,000 tokens, facilitating the processing of extended inputs for complex tasks. Consistent with other models in the Mistral AI family, it employs Rotary Position Embedding (RoPE) and RMS Normalization. The model utilizes a V3-Tekken tokenizer with a vocabulary size of 131,072.

Ministral-3B-2410 is engineered for a variety of use cases requiring local inference, such as on-device translation, internet-less smart assistants, local analytics, and autonomous robotics. It supports native function calling capabilities, making it effective for AI agents and specialized tasks. The model is designed for a balance between power efficiency and performance, leveraging pruning and quantization techniques to minimize computational load for deployment on devices with limited hardware capacity.

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

Position Embedding

ROPE

RoPE Theta

Sliding Window Attention

Sliding Window Size

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

12,288

Number of Layers

FFN Intermediate Size (Dense)

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

Model Integrity

Total Score

B-

63 / 100

Upstream

19.5 / 30

Model

24.0 / 40

Downstream

19.0 / 30

Ministral-3B-2410 Model Integrity Report

Total Score

/ 100

B-

Audit Note

Ministral-3B-2410 demonstrates strong transparency in its architectural foundations and tokenizer implementation, benefiting from Mistral AI's established technical standards. However, it suffers from significant opacity regarding its specific training data composition and the total compute resources utilized. While hardware requirements are well-defined for edge deployment, the model's initial restrictive licensing and documented adjustments to internal parameters highlight a need for more consistent disclosure practices.

Upstream

19.5 / 30

Architectural Provenance

7.5 / 10

Ministral-3B-2410 is explicitly documented as a dense Transformer model derived from Mistral Small 3.1 through a 'Cascade Distillation' process. Mistral AI provides technical details including the use of Grouped Query Attention (GQA), Rotary Position Embedding (RoPE), and RMS Normalization. The transition from the original 'les Ministraux' (October 2024) to the 'Ministral 3' series (December 2025) is documented, noting the addition of a 410M parameter vision encoder and the use of tied input-output embeddings to maintain the 3B scale. However, specific layer-by-layer architectural modifications for the 2410 variant specifically are less detailed than the later 2512 release.

Dataset Composition

3.5 / 10

Information regarding the training data is limited to high-level descriptions. Official documentation mentions a 'large proportion of multilingual and code data' and the use of 'publicly available and synthetic datasets.' While the 'Cascade Distillation' methodology explains how knowledge is transferred from the parent model (Mistral Small), there is no public breakdown of the specific data mixture, token counts per category, or detailed filtering/cleaning procedures for the 2410 version.

Tokenizer Integrity

8.5 / 10

The model uses the V3-Tekken tokenizer, which is well-documented and publicly available via the 'mistral-common' library. It features a vocabulary size of 131,072 tokens and is based on tiktoken, moving away from the previous sentencepiece-based versions. The tokenizer's behavior, including the absence of leading whitespaces in chat templates, is explicitly detailed in the Mistral AI Cookbook and technical documentation.

Model

24.0 / 40

Parameter Density

7.0 / 10

The model is clearly identified as a dense architecture with approximately 3 billion parameters. For the 2410 variant, the parameter count is consistently reported. In the subsequent Ministral 3 (2512) update, the breakdown is even more transparent, specifying 3.4B for the language decoder and 410M for the vision encoder. The use of tied embeddings is also disclosed as a method to manage parameter density.

Training Compute

3.0 / 10

Mistral AI discloses that the models were trained on NVIDIA Hopper GPUs (H100/H200) through collaborations with NVIDIA. However, specific compute metrics such as total GPU hours, energy consumption, or carbon footprint for the Ministral-3B-2410 training run are not publicly available. The documentation focuses more on the efficiency of the distillation method rather than the absolute resources consumed.

Benchmark Reproducibility

5.0 / 10

Mistral provides standard benchmark results (MMLU, GSM8K, HumanEval) and compares them against competitors like Llama 3.2 3B and Gemma 2 2B. While evaluation code and specific prompt templates are not fully public in a single repository, third-party platforms like OpenRouter and Artificial Analysis have verified performance, though they occasionally report lower scores than official claims. The lack of a comprehensive technical paper for the 2410 version specifically (unlike the 2026 'Ministral 3' paper) limits full reproducibility.

Identity Consistency

9.0 / 10

The model consistently identifies itself as a Mistral-developed product. It maintains a clear versioning string (2410) and does not exhibit identity confusion with models from other providers. Documentation clearly distinguishes between the base, instruct, and later reasoning variants, ensuring users know which model they are interacting with.

Downstream

19.0 / 30

License Clarity

6.5 / 10

The licensing for Ministral-3B-2410 is somewhat complex. While Mistral AI is known for open-source contributions, this specific model was initially released under the 'Mistral Commercial License' for self-deployment, with weights not initially available to researchers (unlike the 8B version). The later 'Ministral 3' (2512) update moved to Apache 2.0, but the 2410 variant remains under more restrictive commercial terms for many users, creating some ambiguity in the 'open' status of the weights.

Hardware Footprint

7.5 / 10

Hardware requirements are well-documented for various deployment scenarios. Mistral and partners like NVIDIA and DigitalOcean provide specific VRAM guidance (e.g., ~8GB for 3B models). The model is optimized for vLLM and llama.cpp, with quantization support (FP8, INT4) and its impact on memory and latency being publicly discussed in technical blogs and community documentation.

Versioning Drift

5.0 / 10

Mistral uses date-based versioning (2410) and maintains a public changelog. However, there have been reports of silent updates, such as the November 2024 adjustment where the temperature parameter was downscaled by a multiplier of 0.43 across the Ministral family to 'unify model behavior.' While documented in the changelog, such changes can cause behavior drift for users relying on previous settings.

Resources

Official Documentation Release Notes

About Ministral

The Ministral model family, developed by Mistral AI, includes 3B and 8B parameter versions for on-device and edge computing. Designed for compute efficiency and low latency, these models support up to 128K context length. The 8B version incorporates an interleaved sliding-window attention pattern for efficient inference.

Other Ministral Models

Ministral-8B-2410