Mixtral-8x22B-v0.1: Specifications and GPU VRAM Requirements

Mixtral-8x22B-v0.1

Open Source

Open Weights

Active Parameters

176B

Context Length

65.536K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

10 Apr 2024

Knowledge Cutoff

Technical Specifications

Total Expert Parameters

22.0B

Number of Experts

Active Experts

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

1024

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Mixtral-8x22B-v0.1

Mixtral-8x22B-v0.1 is a large language model developed by Mistral AI, characterized by its Sparse Mixture-of-Experts (SMoE) architecture. This design approach enables the model to handle a wide array of natural language processing tasks efficiently, including text generation and comprehension. The model's architecture is engineered to balance computational demands with high performance, making it suitable for applications requiring substantial language understanding capabilities.

The core of Mixtral-8x22B-v0.1's architecture involves a system of eight specialized neural network experts, each contributing to the model's overall processing capacity. While the model comprises a total of 176 billion parameters, its sparse activation mechanism ensures that only two of these experts are actively engaged for any given input token. This selective activation results in an active parameter count of approximately 39 billion, significantly reducing the computational load during inference compared to a densely activated model of equivalent total size. The model operates with a decoder-only transformer framework and utilizes sparse activation patterns for optimized performance.

Mixtral-8x22B-v0.1 demonstrates proficiency across multiple domains, including multilingual understanding, mathematical problem-solving, and code generation. It is fluent in languages such as English, French, Italian, German, and Spanish. Furthermore, it incorporates native function calling capabilities, enhancing its utility in integrated application environments. These characteristics make it a robust tool for diverse use cases such as chatbot development, content creation, document summarization, and complex question-answering systems that benefit from its ability to process extensive context windows.

About Mixtral

The Mixtral model family, developed by Mistral AI, employs a sparse Mixture-of-Experts (SMoE) architecture. This design utilizes multiple expert networks per layer, where a router selects a subset to process each token. This enables large total parameter counts while maintaining computational efficiency by activating only a fraction of parameters per forward pass.

Other Mixtral Models

Mixtral-8x7B-v0.1

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#41

Benchmark	Score	Rank
Summarization ProLLM Summarization	0.59	17

Rankings

Overall Rank

#41

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

32k

64k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights Source Code