Mistral-Small-2501: Specifications and GPU VRAM Requirements

Mistral-Small-2501

开源

开放权重

参数

24B

上下文长度

32.768K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

13 Jan 2025

训练数据截止日期

Oct 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

32768

层数

注意力头

键值头

激活函数

SwigLU

归一化

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Mistral-Small-2501

Mistral Small 3, specifically the Mistral-Small-2501 variant, is a 24-billion-parameter language model developed by Mistral AI, engineered for optimal efficiency and low-latency performance in generative AI tasks. This model is delivered as both a pre-trained base model and an instruction-tuned checkpoint, making it suitable for a range of language-centric applications. Its release under the Apache 2.0 license underscores its commitment to an open ecosystem, enabling widespread adoption and modification.

The architectural foundation of Mistral-Small-2501 is a dense transformer network, distinguished by a design that incorporates fewer layers compared to larger models, thereby minimizing time per forward pass. The model utilizes Grouped-Query Attention (GQA) to enhance inference efficiency and integrates Rotary Position Embeddings (RoPE) for effective positional encoding. The SwiGLU activation function is employed within its layers. With a substantial context window of 32,768 tokens, the model is capable of processing and generating extended sequences of text. It supports multiple languages, reinforcing its applicability in diverse global contexts.

Mistral Small 3 (Mistral-Small-2501) is designed for practical deployment, emphasizing rapid response times. It exhibits performance characteristics that position it as a proficient solution for scenarios demanding quick and accurate language processing, such as conversational agents, automated function calling, and specialized domain-specific applications through fine-tuning. Its efficient architecture allows for deployment on various computational platforms, including consumer-grade hardware, making it suitable for localized inference and applications with strict latency requirements.

关于 Mistral Small 3

Mistral Small 3, a 24 billion parameter model, was designed for efficient, low-latency generative AI tasks. Its optimized architecture supports local deployment and includes multimodal understanding, multilingual capabilities, and a 128,000-token context window.

其他 Mistral Small 3 模型

没有相关模型

评估基准

排名适用于本地LLM。

排名

#47

基准	分数	排名
Summarization ProLLM Summarization	0.75	8
QA Assistant ProLLM QA Assistant	0.91	10
Agentic Coding LiveBench Agentic	0.08	11
StackEval ProLLM Stack Eval	0.84	11
Refactoring Aider Refactoring	0.38	12
StackUnseen ProLLM Stack Unseen	0.12	13
Coding Aider Coding	0.38	17
Coding LiveBench Coding	0.50	18
Graduate-Level QA GPQA	0.45	19
Reasoning LiveBench Reasoning	0.37	20
Data Analysis LiveBench Data Analysis	0.52	20
Mathematics LiveBench Mathematics	0.38	27
Professional Knowledge MMLU Pro	0.66	27
General Knowledge MMLU	0.45	30

排名

#47

编程排名

#40

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

16k

32k

所需显存:

资源

官方文档发布说明下载权重

Mistral-Small-2501

技术规格

系统要求

Mistral-Small-2501

关于 Mistral Small 3

其他 Mistral Small 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源