Mistral-Small-2501: Specifications and GPU VRAM Requirements

Mistral-Small-2501

Open Source

Open Weights

Parameters

24B

Context Length

32.768K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

13 Jan 2025

Knowledge Cutoff

Oct 2023

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

32768

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Mistral-Small-2501

Mistral Small 3, specifically the Mistral-Small-2501 variant, is a 24-billion-parameter language model developed by Mistral AI, engineered for optimal efficiency and low-latency performance in generative AI tasks. This model is delivered as both a pre-trained base model and an instruction-tuned checkpoint, making it suitable for a range of language-centric applications. Its release under the Apache 2.0 license underscores its commitment to an open ecosystem, enabling widespread adoption and modification.

The architectural foundation of Mistral-Small-2501 is a dense transformer network, distinguished by a design that incorporates fewer layers compared to larger models, thereby minimizing time per forward pass. The model utilizes Grouped-Query Attention (GQA) to enhance inference efficiency and integrates Rotary Position Embeddings (RoPE) for effective positional encoding. The SwiGLU activation function is employed within its layers. With a substantial context window of 32,768 tokens, the model is capable of processing and generating extended sequences of text. It supports multiple languages, reinforcing its applicability in diverse global contexts.

Mistral Small 3 (Mistral-Small-2501) is designed for practical deployment, emphasizing rapid response times. It exhibits performance characteristics that position it as a proficient solution for scenarios demanding quick and accurate language processing, such as conversational agents, automated function calling, and specialized domain-specific applications through fine-tuning. Its efficient architecture allows for deployment on various computational platforms, including consumer-grade hardware, making it suitable for localized inference and applications with strict latency requirements.

About Mistral Small 3

Mistral Small 3, a 24 billion parameter model, was designed for efficient, low-latency generative AI tasks. Its optimized architecture supports local deployment and includes multimodal understanding, multilingual capabilities, and a 128,000-token context window.

Other Mistral Small 3 Models

No related models available

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#37

Benchmark	Score	Rank
Summarization ProLLM Summarization	0.75	6
StackUnseen ProLLM Stack Unseen	0.35	8
QA Assistant ProLLM QA Assistant	0.91	9
StackEval ProLLM Stack Eval	0.81	11
Agentic Coding LiveBench Agentic	0.08	12
Refactoring Aider Refactoring	0.38	12
General Knowledge MMLU	0.68	13
Coding Aider Coding	0.38	15
Professional Knowledge MMLU Pro	0.66	16
Coding LiveBench Coding	0.50	17
Reasoning LiveBench Reasoning	0.37	18
Data Analysis LiveBench Data Analysis	0.52	18
Graduate-Level QA GPQA	0.45	20
Mathematics LiveBench Mathematics	0.38	25

Rankings

Overall Rank

#37

Coding Rank

#34

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

16k

32k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights