Parameters
8B
Context Length
256K
Modality
Multimodal
Architecture
Dense
License
Apache 2.0
Release Date
2 Dec 2025
Knowledge Cutoff
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
4096
Number of Layers
32
Attention Heads
32
Key-Value Heads
8
Activation Function
-
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
The Ministral 3 8B model is a member of the Ministral 3 family, developed by Mistral AI, engineered to provide advanced multimodal and multilingual capabilities for edge and resource-constrained environments. This model incorporates 8.4 billion language model parameters complemented by a 0.4 billion vision encoder, totaling 8.8 billion parameters, distinguishing it as a balanced and efficient solution for localized AI deployments. It is designed for versatility, supporting a range of applications from real-time chat interfaces to sophisticated agentic workflows.
Architecturally, Ministral 3 8B is a dense transformer model featuring 32 hidden layers and a hidden dimension size of 4096. Its attention mechanism utilizes 32 attention heads with 8 key-value heads, indicating the use of Grouped Query Attention (GQA) for efficient processing. The model employs Rotary Position Embeddings (RoPE) for handling sequence length and uses a SwiGLU (SiLU) activation function, alongside RMS Normalization for stable training and inference. The architecture is optimized for performance in scenarios where computational resources are limited, supporting an extensive context length of 256,000 tokens.
Ministral 3 8B is equipped with native multimodal understanding, enabling it to process and interpret both text and visual inputs. It offers robust multilingual support, proficient across numerous languages including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, and Korean. The model further integrates native function calling capabilities and supports JSON output, facilitating integration into various agentic systems and automated workflows. These characteristics make it suitable for applications such as image and document description, local AI assistants, and specialized problem-solving in embedded systems.
Ministral 3 is a family of efficient edge models with vision capabilities, available in 3B, 8B, and 14B parameter sizes. Designed for edge deployment with multimodal and multilingual support, offering best-in-class performance for resource-constrained environments.
Ranking is for Local LLMs.
No evaluation benchmarks for Ministral 3 8B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens