趋近智
参数
123B
上下文长度
128K
模态
Text
架构
Dense
许可证
Mistral Research License
发布日期
24 Jul 2024
知识截止
Oct 2023
注意力结构
Grouped-Query Attention
隐藏维度大小
-
层数
64
注意力头
48
键值头
8
激活函数
-
归一化
RMS Normalization
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Mistral Large 2 (Mistral-Large-2407) is the newest generation of Mistral AI's flagship large language models, designed to advance capabilities in natural language understanding and generation. It is built upon a decoder-only Transformer architecture, a widely adopted design for constructing efficient and scalable language models. The model integrates 123 billion parameters, enabling it to process and generate complex linguistic structures with a high degree of fidelity. A key architectural characteristic includes its design for single-node inference, which facilitates high throughput in long-context applications.
This model is distinguished by its extensive 128,000-token context window, allowing it to maintain coherence over extended documents and interactions. It incorporates Grouped Query Attention (GQA) with 48 attention heads and 8 key-value heads, which contributes to its computational efficiency while managing long sequences. The model also leverages Rotary Position Embeddings (RoPE) for effective positional encoding and integrates Flash Attention for optimized processing speed. These architectural choices aim to balance performance with computational requirements.
Mistral Large 2 exhibits enhanced performance across a range of linguistic tasks, including advanced code generation, complex mathematical problem-solving, and sophisticated reasoning. It supports over 80 programming languages, such as Python, Java, C, C++, and JavaScript, and operates proficiently across dozens of human languages, including Russian, Chinese, Japanese, Korean, Spanish, Italian, Portuguese, Arabic, and Hindi, indicating broad multilingual capabilities. Furthermore, the model is equipped with robust function calling abilities and supports native JSON output, facilitating its integration into complex automated workflows and agentic systems. A significant focus during its development was placed on minimizing the generation of erroneous or irrelevant information, thereby enhancing the reliability of its outputs and improving instruction following.
Mistral Large 2 is a 123 billion parameter, dense transformer model engineered for advanced language and code generation, supporting over 80 programming languages. Its 128,000 token context window facilitates complex reasoning and long-context applications on a single node. Enhanced function calling capabilities are integrated.
排名适用于本地LLM。
排名
#19
基准 | 分数 | 排名 |
---|---|---|
QA Assistant ProLLM QA Assistant | 0.96 | 🥈 2 |
General Knowledge MMLU | 0.84 | 🥈 2 |
Refactoring Aider Refactoring | 0.60 | 5 |
Coding Aider Coding | 0.65 | 7 |
Coding LiveBench Coding | 0.63 | 8 |
StackEval ProLLM Stack Eval | 0.88 | 8 |
Summarization ProLLM Summarization | 0.73 | 9 |
Data Analysis LiveBench Data Analysis | 0.54 | 16 |
Agentic Coding LiveBench Agentic | 0.02 | 19 |
Reasoning LiveBench Reasoning | 0.34 | 22 |
Mathematics LiveBench Mathematics | 0.42 | 22 |