Mistral-7B-Instruct-v0.2: Specifications and GPU VRAM Requirements

Mistral-7B-Instruct-v0.2

Open Source

Open Weights

Parameters

7.3B

Context Length

32.768K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

15 Jan 2024

Knowledge Cutoff

Dec 2023

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Mistral-7B-Instruct-v0.2

Mistral-7B-Instruct-v0.2 is an instruction-tuned large language model comprising 7.3 billion parameters. This model is engineered to interpret and execute specific instructions, rendering it suitable for applications such as conversational AI, automated dialogue systems, and content generation tasks like question answering and summarization. It is an enhanced iteration derived from the Mistral-7B-v0.2 base model, distinguishing itself through its fine-tuned instruction-following capabilities.

The architectural foundation of Mistral-7B-Instruct-v0.2 is the transformer, which integrates Grouped-Query Attention (GQA) to optimize inference efficiency. A key architectural distinction in this instruct variant, compared to earlier base models, is the deliberate exclusion of Sliding-Window Attention. Instead, the model supports an expanded context window of 32,000 tokens, facilitating the processing of extended text sequences while maintaining semantic coherence. It incorporates Rotary Position Embeddings (RoPE) with a theta value set at 1e6 and employs a Byte-fallback BPE tokenizer to handle a diverse range of textual inputs.

Mistral-7B-Instruct-v0.2 is designed for flexible deployment across various computing environments, including local systems and cloud-based platforms. Its operational design focuses on precise performance in instruction-following scenarios. The model is distributed under the Apache 2.0 License, which enables open access, use, and integration into diverse research and development projects without restriction.

About Mistral 7B

Mistral 7B, a 7.3 billion parameter model, utilizes a decoder-only transformer architecture. It features Sliding Window Attention and Grouped Query Attention for efficient long sequence processing. A Rolling Buffer Cache optimizes memory use, contributing to its design for efficient language processing.

Other Mistral 7B Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#19

Benchmark	Score	Rank
General Knowledge MMLU	0.68	13

Rankings

Overall Rank

#19

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

16k

32k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code