Mistral-Large-2407: Specifications and GPU VRAM Requirements

Mistral-Large-2407

Closed Source

Open Weights

Parameters

123B

Context Length

128K

Modality

Text

Architecture

Dense

License

Mistral Research License

Release Date

24 Jul 2024

Knowledge Cutoff

Oct 2023

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Mistral-Large-2407

Mistral Large 2 (Mistral-Large-2407) is the newest generation of Mistral AI's flagship large language models, designed to advance capabilities in natural language understanding and generation. It is built upon a decoder-only Transformer architecture, a widely adopted design for constructing efficient and scalable language models. The model integrates 123 billion parameters, enabling it to process and generate complex linguistic structures with a high degree of fidelity. A key architectural characteristic includes its design for single-node inference, which facilitates high throughput in long-context applications.

This model is distinguished by its extensive 128,000-token context window, allowing it to maintain coherence over extended documents and interactions. It incorporates Grouped Query Attention (GQA) with 48 attention heads and 8 key-value heads, which contributes to its computational efficiency while managing long sequences. The model also leverages Rotary Position Embeddings (RoPE) for effective positional encoding and integrates Flash Attention for optimized processing speed. These architectural choices aim to balance performance with computational requirements.

Mistral Large 2 exhibits enhanced performance across a range of linguistic tasks, including advanced code generation, complex mathematical problem-solving, and sophisticated reasoning. It supports over 80 programming languages, such as Python, Java, C, C++, and JavaScript, and operates proficiently across dozens of human languages, including Russian, Chinese, Japanese, Korean, Spanish, Italian, Portuguese, Arabic, and Hindi, indicating broad multilingual capabilities. Furthermore, the model is equipped with robust function calling abilities and supports native JSON output, facilitating its integration into complex automated workflows and agentic systems. A significant focus during its development was placed on minimizing the generation of erroneous or irrelevant information, thereby enhancing the reliability of its outputs and improving instruction following.

About Mistral Large 2

Mistral Large 2 is a 123 billion parameter, dense transformer model engineered for advanced language and code generation, supporting over 80 programming languages. Its 128,000 token context window facilitates complex reasoning and long-context applications on a single node. Enhanced function calling capabilities are integrated.

Other Mistral Large 2 Models

No related models available

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#21

Benchmark	Score	Rank
General Knowledge MMLU	0.84	🥇 1
QA Assistant ProLLM QA Assistant	0.96	🥉 3
Refactoring Aider Refactoring	0.60	5
StackEval ProLLM Stack Eval	0.88	8
Coding LiveBench Coding	0.63	9
Coding Aider Coding	0.60	10
Summarization ProLLM Summarization	0.73	12
Agentic Coding LiveBench Agentic	0.02	18
Data Analysis LiveBench Data Analysis	0.54	18
Reasoning LiveBench Reasoning	0.34	24
Mathematics LiveBench Mathematics	0.42	24

Rankings

Overall Rank

#21

Coding Rank

#11

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

63k

125k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights