ApX logoApX logo

Ministral 3 14B

Parameters

14B

Context Length

256K

Modality

Multimodal

Architecture

Dense

License

Apache 2.0

Release Date

2 Dec 2025

Knowledge Cutoff

Jun 2025

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

32

Key-Value Heads

8

Attention Head Dimension

128

Position Embedding

Absolute Position Embedding

RoPE Theta

1,000,000,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

5,120

Number of Layers

40

FFN Intermediate Size (Dense)

16,384

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

131,072

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 5.1k · Context: 256k · Vocab: 131.1kx 40 layersRMSNormPre-AttentionMulti-Head Attention32Q / 8KV headsHead dim: 128+RMSNormPre-FFNFeed-Forward NetworkSwiGLUIntermediate: 16.4k+Final RMSNormOutput Logits

Ministral 3 14B

Ministral 3 14B is a high-density, multimodal transformer model engineered by Mistral AI to bridge the gap between edge-efficient computing and frontier-class intelligence. As the largest member of the Ministral 3 family, it employs a sophisticated Cascade Distillation strategy, where knowledge is progressively transferred from larger parent models, such as Mistral Small 3.1, into a more compact 14-billion-parameter footprint. This architecture integrates a 13.5-billion-parameter decoder-only language core with a frozen 410-million-parameter Vision Transformer (ViT) encoder, enabling the model to process interleaved image and text inputs with high precision.

The technical foundation of the model features 40 transformer layers and a hidden dimension of 5120, utilizing Grouped Query Attention (GQA) with 32 query heads and 8 key-value heads to optimize memory throughput during inference. It incorporates modern architectural best practices, including RMSNorm for stable normalization, SwiGLU activation functions for enhanced non-linear processing, and Rotary Positional Embeddings (RoPE) enhanced by YaRN scaling. These components collectively support an expansive context window of 256,000 tokens, allowing for the ingestion of massive document sets or complex multi-turn agentic workflows without performance degradation.

Designed for sophisticated automation and private AI deployments, Ministral 3 14B excels in agentic tasks through native support for function calling and structured JSON outputs. Its training emphasizes efficiency and versatility, providing robust multilingual capabilities across more than 40 languages and high-tier performance in reasoning-heavy domains like mathematics and coding. By balancing a dense architectural structure with advanced quantization compatibility, the model is optimized for deployment on local workstations and enterprise edge hardware, offering a high-performance alternative to much larger cloud-based systems.

About Ministral 3

Ministral 3 is a family of efficient edge models with vision capabilities, available in 3B, 8B, and 14B parameter sizes. Designed for edge deployment with multimodal and multilingual support, offering best-in-class performance for resource-constrained environments.


Other Ministral 3 Models

Evaluation Benchmarks

Rank

#80

BenchmarkScoreRank

General Knowledge

MMLU

0.794

24

Rankings

Overall Rank

#80

Coding Rank

-

Model Integrity

Total Score

B+

73 / 100

Ministral 3 14B Model Integrity Report

Total Score

73

/ 100

B+

Audit Note

Ministral 3 14B exhibits strong transparency in its architectural design and licensing, providing a clear lineage and a permissive open-source foundation. While it offers detailed hardware requirements and a well-integrated tokenizer, it remains opaque regarding the specific sources of its training data and the environmental cost of its compute resources. The model's identity and parameter density are clearly defined, though benchmark reproducibility is hampered by the lack of public evaluation code.

Upstream

21.5 / 30

Architectural Provenance

8.5 / 10

The model's architecture is extensively documented in the 'Ministral 3' technical report (arXiv:2601.08584). It explicitly identifies the base model as a descendant of Mistral Small 3.1, derived through a 'Cascade Distillation' process. Technical specifications are precise: 40 transformer layers, a hidden dimension of 5120, and Grouped Query Attention (GQA) with 32 query heads and 8 key-value heads. It also details the integration of a frozen 410M parameter vision encoder from the Pixtral architecture and the use of YaRN for context extension.

Dataset Composition

4.0 / 10

While the technical report describes the training methodology (Cascade Distillation) and the number of tokens (1-3 trillion), it lacks a detailed breakdown of the dataset composition. It mentions using 'open and proprietary sources' and 'question-answer pairs' for post-training but does not provide specific percentages or named sources for the pretraining data. This follows the industry trend of high-level descriptions without granular transparency.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly accessible via the 'mistral-common' library (version >= 1.8.6) and is integrated into the Hugging Face transformers ecosystem. The vocabulary size is explicitly stated as 131,072 tokens. Documentation confirms support for over 40 languages and provides clear instructions for implementation in various inference frameworks like vLLM and llama.cpp.

Model

27.5 / 40

Parameter Density

9.5 / 10

The model provides a highly transparent breakdown of its parameters: a total of 14 billion, consisting of a 13.5B language core and a 0.4B vision encoder. As a dense model, all 14B parameters are active during inference, which is clearly stated in official documentation and the technical report, avoiding the ambiguity often found in Mixture-of-Experts (MoE) models.

Training Compute

3.0 / 10

Documentation states the model was trained on NVIDIA Hopper GPUs (H100/H200), but it fails to disclose specific compute metrics such as total GPU hours, energy consumption, or the resulting carbon footprint. While it claims the 'Cascade Distillation' method is compute-efficient compared to training from scratch, it provides no verifiable data to quantify this efficiency or the environmental impact.

Benchmark Reproducibility

6.0 / 10

Mistral AI provides comprehensive benchmark results (AIME25, GPQA, Arena Hard, etc.) in the technical report and model cards. However, while they specify the versions and some evaluation settings (e.g., pass@k, temperature), the exact prompts and full evaluation code are not consistently provided in a single, reproducible repository, making independent verification of the exact reported scores difficult for the community.

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as part of the Ministral 3 family in official documentation and through its metadata. It uses a clear semantic versioning-style naming convention (e.g., '2512' for the December 2025 release). There are no documented cases of the model claiming to be a competitor's product or misrepresenting its 14B parameter scale.

Downstream

23.5 / 30

License Clarity

10.0 / 10

The model is released under the Apache 2.0 license, which is a standard, highly permissive open-source license. The terms are clear, allowing for both commercial and non-commercial use, modification, and distribution. There are no conflicting 'custom' terms or revenue-based restrictions for this specific model variant, providing maximum legal transparency.

Hardware Footprint

8.5 / 10

Hardware requirements are well-documented across multiple sources, including the official model card and third-party platforms like NVIDIA NIM and Ollama. It provides specific VRAM requirements for different precisions: ~32GB for BF16 and ~24GB for FP8. It also notes the memory scaling impact of its 256k context window and provides guidance on using quantization (Q4/Q8) to fit on consumer hardware.

Versioning Drift

5.0 / 10

Mistral maintains a public changelog for its API and model releases, and the model uses a date-based versioning suffix ('2512'). However, there is limited documentation regarding performance drift or specific 'alignment tax' impacts over time. While the initial release is well-defined, the long-term tracking of behavioral changes for this specific variant is not yet established in a detailed, public-facing version history.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
125k
250k

VRAM Required:

Recommended GPUs