趋近智
参数
7.3B
上下文长度
8.192K
模态
Text
架构
Dense
许可证
Apache 2.0
发布日期
27 Sept 2023
知识截止
Aug 2021
注意力结构
Grouped-Query Attention
隐藏维度大小
4096
层数
32
注意力头
32
键值头
8
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Mistral-7B-v0.1 is a 7.3 billion parameter large language model developed by Mistral AI, engineered for superior performance and computational efficiency in natural language processing tasks. Its design prioritizes efficient inference, making it suitable for practical deployment across various applications. The model is built upon a decoder-only transformer architecture, integrating several key innovations to optimize its operation.
Mistral 7B, a 7.3 billion parameter model, utilizes a decoder-only transformer architecture. It features Sliding Window Attention and Grouped Query Attention for efficient long sequence processing. A Rolling Buffer Cache optimizes memory use, contributing to its design for efficient language processing.
排名适用于本地LLM。
没有可用的 Mistral-7B-v0.1 评估基准。