Parameters
7.3B
Context Length
32.768K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
15 Jan 2024
Knowledge Cutoff
Dec 2023
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
4096
Number of Layers
32
Attention Heads
32
Key-Value Heads
8
Activation Function
-
Normalization
-
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Mistral-7B-Instruct-v0.2 is an instruction-tuned large language model comprising 7.3 billion parameters. This model is engineered to interpret and execute specific instructions, rendering it suitable for applications such as conversational AI, automated dialogue systems, and content generation tasks like question answering and summarization. It is an enhanced iteration derived from the Mistral-7B-v0.2 base model, distinguishing itself through its fine-tuned instruction-following capabilities.
The architectural foundation of Mistral-7B-Instruct-v0.2 is the transformer, which integrates Grouped-Query Attention (GQA) to optimize inference efficiency. A key architectural distinction in this instruct variant, compared to earlier base models, is the deliberate exclusion of Sliding-Window Attention. Instead, the model supports an expanded context window of 32,000 tokens, facilitating the processing of extended text sequences while maintaining semantic coherence. It incorporates Rotary Position Embeddings (RoPE) with a theta value set at 1e6 and employs a Byte-fallback BPE tokenizer to handle a diverse range of textual inputs.
Mistral-7B-Instruct-v0.2 is designed for flexible deployment across various computing environments, including local systems and cloud-based platforms. Its operational design focuses on precise performance in instruction-following scenarios. The model is distributed under the Apache 2.0 License, which enables open access, use, and integration into diverse research and development projects without restriction.
Mistral 7B, a 7.3 billion parameter model, utilizes a decoder-only transformer architecture. It features Sliding Window Attention and Grouped Query Attention for efficient long sequence processing. A Rolling Buffer Cache optimizes memory use, contributing to its design for efficient language processing.
Ranking is for Local LLMs.
No evaluation benchmarks for Mistral-7B-Instruct-v0.2 available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens