ApX 标志

趋近智

Mistral-Large-2407

参数

123B

上下文长度

128K

模态

Text

架构

Dense

许可证

Mistral Research License

发布日期

24 Jul 2024

知识截止

Oct 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

-

层数

64

注意力头

48

键值头

8

激活函数

-

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Mistral-Large-2407

Mistral Large 2 (Mistral-Large-2407) is the newest generation of Mistral AI's flagship large language models, designed to advance capabilities in natural language understanding and generation. It is built upon a decoder-only Transformer architecture, a widely adopted design for constructing efficient and scalable language models. The model integrates 123 billion parameters, enabling it to process and generate complex linguistic structures with a high degree of fidelity. A key architectural characteristic includes its design for single-node inference, which facilitates high throughput in long-context applications.

This model is distinguished by its extensive 128,000-token context window, allowing it to maintain coherence over extended documents and interactions. It incorporates Grouped Query Attention (GQA) with 48 attention heads and 8 key-value heads, which contributes to its computational efficiency while managing long sequences. The model also leverages Rotary Position Embeddings (RoPE) for effective positional encoding and integrates Flash Attention for optimized processing speed. These architectural choices aim to balance performance with computational requirements.

Mistral Large 2 exhibits enhanced performance across a range of linguistic tasks, including advanced code generation, complex mathematical problem-solving, and sophisticated reasoning. It supports over 80 programming languages, such as Python, Java, C, C++, and JavaScript, and operates proficiently across dozens of human languages, including Russian, Chinese, Japanese, Korean, Spanish, Italian, Portuguese, Arabic, and Hindi, indicating broad multilingual capabilities. Furthermore, the model is equipped with robust function calling abilities and supports native JSON output, facilitating its integration into complex automated workflows and agentic systems. A significant focus during its development was placed on minimizing the generation of erroneous or irrelevant information, thereby enhancing the reliability of its outputs and improving instruction following.

关于 Mistral Large 2

Mistral Large 2 is a 123 billion parameter, dense transformer model engineered for advanced language and code generation, supporting over 80 programming languages. Its 128,000 token context window facilitates complex reasoning and long-context applications on a single node. Enhanced function calling capabilities are integrated.


其他 Mistral Large 2 模型
  • 没有相关模型

评估基准

排名适用于本地LLM。

排名

#19

基准分数排名

0.96

🥈

2

General Knowledge

MMLU

0.84

🥈

2

0.60

5

0.65

7

0.63

8

0.88

8

0.73

9

0.54

16

Agentic Coding

LiveBench Agentic

0.02

19

0.34

22

0.42

22

排名

排名

#19

编程排名

#8

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
63k
125k

所需显存:

推荐 GPU