趋近智
活跃参数
41B
上下文长度
256K
模态
Multimodal
架构
Mixture of Experts (MoE)
许可证
Apache 2.0
发布日期
2 Dec 2025
训练数据截止日期
-
专家参数总数
675.0B
专家数量
-
活跃专家
-
注意力结构
Multi-Head Attention
隐藏维度大小
-
层数
-
注意力头
-
键值头
-
激活函数
-
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
Mistral Large 3 is a state-of-the-art, general-purpose multimodal model, characterized by its granular Mixture-of-Experts (MoE) architecture. The model incorporates 41 billion active parameters within a total parameter pool of 675 billion, representing a substantial computational capacity. It was trained from scratch using a cluster of 3000 NVIDIA H200 GPUs.
This variant is specifically an instruct-post-trained version, fine-tuned for instruction-following tasks. It is designed to excel in conversational AI, agentic functions, and other instruction-based use cases, making it suitable for deployment in production-grade assistants, retrieval-augmented generation (RAG) systems, scientific applications, and complex enterprise workflows. The model is engineered for reliability and robust long-context comprehension, supporting a context window of 256,000 tokens.
A key architectural component of Mistral Large 3 is its integrated 2.5 billion parameter Vision Encoder, enabling multimodal capabilities that allow the model to analyze images and derive insights from visual content alongside text. It offers strong multilingual support across dozens of languages, including major European and Asian languages. Furthermore, Mistral Large 3 demonstrates strong adherence to system prompts and provides best-in-class agentic capabilities, including native function calling and JSON output generation.
Mistral Large 3 is a state-of-the-art general-purpose multimodal model with a granular Mixture-of-Experts architecture. With 675B total parameters and 41B active parameters, it delivers frontier performance for production-grade assistants, retrieval-augmented systems, and complex enterprise workflows.
排名适用于本地LLM。
没有可用的 Mistral Large 3 评估基准。