趋近智
参数
7.3B
上下文长度
32.768K
模态
Text
架构
Dense
许可证
Apache 2.0
发布日期
15 Jan 2024
知识截止
Dec 2023
注意力结构
Grouped-Query Attention
隐藏维度大小
4096
层数
32
注意力头
32
键值头
8
激活函数
-
归一化
-
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Mistral-7B-Instruct-v0.2 is an instruction-tuned large language model comprising 7.3 billion parameters. This model is engineered to interpret and execute specific instructions, rendering it suitable for applications such as conversational AI, automated dialogue systems, and content generation tasks like question answering and summarization. It is an enhanced iteration derived from the Mistral-7B-v0.2 base model, distinguishing itself through its fine-tuned instruction-following capabilities.
The architectural foundation of Mistral-7B-Instruct-v0.2 is the transformer, which integrates Grouped-Query Attention (GQA) to optimize inference efficiency. A key architectural distinction in this instruct variant, compared to earlier base models, is the deliberate exclusion of Sliding-Window Attention. Instead, the model supports an expanded context window of 32,000 tokens, facilitating the processing of extended text sequences while maintaining semantic coherence. It incorporates Rotary Position Embeddings (RoPE) with a theta value set at 1e6 and employs a Byte-fallback BPE tokenizer to handle a diverse range of textual inputs.
Mistral-7B-Instruct-v0.2 is designed for flexible deployment across various computing environments, including local systems and cloud-based platforms. Its operational design focuses on precise performance in instruction-following scenarios. The model is distributed under the Apache 2.0 License, which enables open access, use, and integration into diverse research and development projects without restriction.
Mistral 7B, a 7.3 billion parameter model, utilizes a decoder-only transformer architecture. It features Sliding Window Attention and Grouped Query Attention for efficient long sequence processing. A Rolling Buffer Cache optimizes memory use, contributing to its design for efficient language processing.
排名适用于本地LLM。
没有可用的 Mistral-7B-Instruct-v0.2 评估基准。