ApX 标志

趋近智

Mistral Large 3

活跃参数

41B

上下文长度

256K

模态

Multimodal

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

2 Dec 2025

训练数据截止日期

-

技术规格

专家参数总数

675.0B

专家数量

-

活跃专家

-

注意力结构

Multi-Head Attention

隐藏维度大小

-

层数

-

注意力头

-

键值头

-

激活函数

-

归一化

-

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Mistral Large 3

Mistral Large 3 is a state-of-the-art, general-purpose multimodal model, characterized by its granular Mixture-of-Experts (MoE) architecture. The model incorporates 41 billion active parameters within a total parameter pool of 675 billion, representing a substantial computational capacity. It was trained from scratch using a cluster of 3000 NVIDIA H200 GPUs.

This variant is specifically an instruct-post-trained version, fine-tuned for instruction-following tasks. It is designed to excel in conversational AI, agentic functions, and other instruction-based use cases, making it suitable for deployment in production-grade assistants, retrieval-augmented generation (RAG) systems, scientific applications, and complex enterprise workflows. The model is engineered for reliability and robust long-context comprehension, supporting a context window of 256,000 tokens.

A key architectural component of Mistral Large 3 is its integrated 2.5 billion parameter Vision Encoder, enabling multimodal capabilities that allow the model to analyze images and derive insights from visual content alongside text. It offers strong multilingual support across dozens of languages, including major European and Asian languages. Furthermore, Mistral Large 3 demonstrates strong adherence to system prompts and provides best-in-class agentic capabilities, including native function calling and JSON output generation.

关于 Mistral Large 3

Mistral Large 3 is a state-of-the-art general-purpose multimodal model with a granular Mixture-of-Experts architecture. With 675B total parameters and 41B active parameters, it delivers frontier performance for production-grade assistants, retrieval-augmented systems, and complex enterprise workflows.


其他 Mistral Large 3 模型
  • 没有相关模型

评估基准

排名适用于本地LLM。

没有可用的 Mistral Large 3 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
125k
250k

所需显存:

推荐 GPU