Ministral-8B-2410: Specifications and GPU VRAM Requirements

Ministral-8B-2410

Closed Source

Open Weights

Parameters

Context Length

128K

Modality

Text

Architecture

Dense

License

Mistral Research License

Release Date

10 Oct 2024

Knowledge Cutoff

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

12288

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Ministral-8B-2410

The Ministral-8B-2410 is a state-of-the-art large language model developed by Mistral AI, comprising approximately 8.0 billion parameters. It is part of the "les Ministraux" model family, introduced alongside Ministral 3B, specifically optimized for local intelligence, on-device computing, and edge computing use cases. The primary objective behind this model family is to deliver compute-efficient and low-latency inference solutions for applications that operate in resource-constrained environments or require privacy-first local data processing. This model is also provided in an instruct-tuned variant, Ministral-8B-Instruct-2410.

The technical architecture of Ministral-8B-2410 is based on a dense Transformer network, featuring 36 layers with 32 attention heads and an embedding dimension of 4096, which projects to a hidden dimension of 12288. A key innovation in its design is the integration of a 128,000-token context window, facilitated by an interleaved sliding-window attention mechanism. This is complemented by Grouped Query Attention (GQA) with 8 key-value heads, enhancing inference speed and memory efficiency. The model utilizes the V3-Tekken tokenizer, supporting a vocabulary size of 131,072 tokens, optimizing its ability to process diverse linguistic inputs.

Ministral-8B-2410 demonstrates capabilities across a range of natural language processing tasks, including content generation, question answering, and code generation or assistance. It is noted for its strong performance in multilingual contexts, supporting 10 major languages, and its built-in support for function calling, enabling advanced API interactions. Its design makes it particularly suitable for practical applications such as on-device translation, internet-independent smart assistants, local data analytics, and autonomous robotics, where its low-latency and efficient processing characteristics are advantageous. The model can also function as an efficient intermediary for handling function calls within complex, multi-step agentic workflows.

About Ministral

The Ministral model family, developed by Mistral AI, includes 3B and 8B parameter versions for on-device and edge computing. Designed for compute efficiency and low latency, these models support up to 128K context length. The 8B version incorporates an interleaved sliding-window attention pattern for efficient inference.

Other Ministral Models

Ministral-3B-2410

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#23

Benchmark	Score	Rank
General Knowledge MMLU	0.65	17

Rankings

Overall Rank

#23

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

63k

125k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights