Mistral-Large-2407: Specifications and GPU VRAM Requirements

Mistral-Large-2407

闭源

开放权重

参数

123B

上下文长度

128K

模态

Text

架构

Dense

许可证

Mistral Research License

发布日期

24 Jul 2024

知识截止

Oct 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

层数

注意力头

键值头

激活函数

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Mistral-Large-2407

Mistral Large 2 (Mistral-Large-2407) is the newest generation of Mistral AI's flagship large language models, designed to advance capabilities in natural language understanding and generation. It is built upon a decoder-only Transformer architecture, a widely adopted design for constructing efficient and scalable language models. The model integrates 123 billion parameters, enabling it to process and generate complex linguistic structures with a high degree of fidelity. A key architectural characteristic includes its design for single-node inference, which facilitates high throughput in long-context applications.

This model is distinguished by its extensive 128,000-token context window, allowing it to maintain coherence over extended documents and interactions. It incorporates Grouped Query Attention (GQA) with 48 attention heads and 8 key-value heads, which contributes to its computational efficiency while managing long sequences. The model also leverages Rotary Position Embeddings (RoPE) for effective positional encoding and integrates Flash Attention for optimized processing speed. These architectural choices aim to balance performance with computational requirements.

Mistral Large 2 exhibits enhanced performance across a range of linguistic tasks, including advanced code generation, complex mathematical problem-solving, and sophisticated reasoning. It supports over 80 programming languages, such as Python, Java, C, C++, and JavaScript, and operates proficiently across dozens of human languages, including Russian, Chinese, Japanese, Korean, Spanish, Italian, Portuguese, Arabic, and Hindi, indicating broad multilingual capabilities. Furthermore, the model is equipped with robust function calling abilities and supports native JSON output, facilitating its integration into complex automated workflows and agentic systems. A significant focus during its development was placed on minimizing the generation of erroneous or irrelevant information, thereby enhancing the reliability of its outputs and improving instruction following.

关于 Mistral Large 2

Mistral Large 2 is a 123 billion parameter, dense transformer model engineered for advanced language and code generation, supporting over 80 programming languages. Its 128,000 token context window facilitates complex reasoning and long-context applications on a single node. Enhanced function calling capabilities are integrated.

其他 Mistral Large 2 模型

没有相关模型

评估基准

排名适用于本地LLM。

排名

#19

基准	分数	排名
QA Assistant ProLLM QA Assistant	0.96	🥈 2
General Knowledge MMLU	0.84	🥈 2
Refactoring Aider Refactoring	0.60	5
Coding Aider Coding	0.65	7
Coding LiveBench Coding	0.63	8
StackEval ProLLM Stack Eval	0.88	8
Summarization ProLLM Summarization	0.73	9
Data Analysis LiveBench Data Analysis	0.54	16
Agentic Coding LiveBench Agentic	0.02	19
Reasoning LiveBench Reasoning	0.34	22
Mathematics LiveBench Mathematics	0.42	22

排名

#19

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

63k

125k

所需显存:

资源

官方文档发布说明下载权重

Mistral-Large-2407

技术规格

系统要求

Mistral-Large-2407

关于 Mistral Large 2

其他 Mistral Large 2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源