Mixtral-8x22B-v0.1: Specifications and GPU VRAM Requirements

Mixtral-8x22B-v0.1

开源

开放权重

活跃参数

176B

上下文长度

65.536K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

10 Apr 2024

知识截止

技术规格

专家参数总数

22.0B

专家数量

活跃专家

注意力结构

Grouped-Query Attention

隐藏维度大小

1024

层数

注意力头

键值头

激活函数

归一化

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Mixtral-8x22B-v0.1

Mixtral-8x22B-v0.1 is a large language model developed by Mistral AI, characterized by its Sparse Mixture-of-Experts (SMoE) architecture. This design approach enables the model to handle a wide array of natural language processing tasks efficiently, including text generation and comprehension. The model's architecture is engineered to balance computational demands with high performance, making it suitable for applications requiring substantial language understanding capabilities.

The core of Mixtral-8x22B-v0.1's architecture involves a system of eight specialized neural network experts, each contributing to the model's overall processing capacity. While the model comprises a total of 176 billion parameters, its sparse activation mechanism ensures that only two of these experts are actively engaged for any given input token. This selective activation results in an active parameter count of approximately 39 billion, significantly reducing the computational load during inference compared to a densely activated model of equivalent total size. The model operates with a decoder-only transformer framework and utilizes sparse activation patterns for optimized performance.

Mixtral-8x22B-v0.1 demonstrates proficiency across multiple domains, including multilingual understanding, mathematical problem-solving, and code generation. It is fluent in languages such as English, French, Italian, German, and Spanish. Furthermore, it incorporates native function calling capabilities, enhancing its utility in integrated application environments. These characteristics make it a robust tool for diverse use cases such as chatbot development, content creation, document summarization, and complex question-answering systems that benefit from its ability to process extensive context windows.

关于 Mixtral

The Mixtral model family, developed by Mistral AI, employs a sparse Mixture-of-Experts (SMoE) architecture. This design utilizes multiple expert networks per layer, where a router selects a subset to process each token. This enables large total parameter counts while maintaining computational efficiency by activating only a fraction of parameters per forward pass.

其他 Mixtral 模型

Mixtral-8x7B-v0.1

评估基准

排名适用于本地LLM。

排名

#38

基准	分数	排名
Summarization ProLLM Summarization	0.59	14

排名

#38

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

32k

64k

所需显存:

资源

官方文档发布说明下载权重源代码

Mixtral-8x22B-v0.1

技术规格

系统要求

Mixtral-8x22B-v0.1

关于 Mixtral

其他 Mixtral 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源