ApX 标志

趋近智

Mixtral-8x7B-v0.1

活跃参数

46.7B

上下文长度

32.768K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

9 Dec 2023

知识截止

Nov 2022

技术规格

专家参数总数

7.0B

专家数量

8

活跃专家

2

注意力结构

Grouped-Query Attention

隐藏维度大小

4096

层数

32

注意力头

32

键值头

8

激活函数

-

归一化

-

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Mixtral-8x7B-v0.1

Mixtral-8x7B-v0.1 is a generative large language model developed by Mistral AI, distinguished by its Sparse Mixture of Experts (SMoE) architecture. This design enables the model to process information efficiently by conditionally activating a subset of its parameters for each input. Its primary purpose is to facilitate advanced text generation and comprehensive language understanding across a diverse range of applications.

The model is built upon a decoder-only transformer architecture. It integrates a Mixture-of-Experts layer where each layer comprises eight distinct feedforward blocks, known as 'experts'. A router network dynamically selects two of these experts to process each token, subsequently combining their outputs additively. This mechanism permits the model to leverage a substantial total parameter count of 46.7 billion while maintaining a significantly lower active parameter count of 12.9 billion per token during inference, thereby optimizing the balance between model capacity and computational efficiency. The architecture further incorporates Grouped Query Attention (GQA) and supports Flash Attention for enhanced performance.

Mixtral-8x7B-v0.1 supports a context length of 32,000 tokens, allowing it to process and generate responses based on extensive textual inputs. The model demonstrates proficiency in multilingual tasks, supporting English, French, Italian, German, and Spanish. It also exhibits strong performance in code generation tasks. The model can be fine-tuned for instruction-following tasks, making it a suitable foundation for building interactive applications that require precise adherence to user commands.

关于 Mixtral

The Mixtral model family, developed by Mistral AI, employs a sparse Mixture-of-Experts (SMoE) architecture. This design utilizes multiple expert networks per layer, where a router selects a subset to process each token. This enables large total parameter counts while maintaining computational efficiency by activating only a fraction of parameters per forward pass.


其他 Mixtral 模型

评估基准

排名适用于本地LLM。

没有可用的 Mixtral-8x7B-v0.1 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
16k
32k

所需显存:

推荐 GPU

Mixtral-8x7B-v0.1: Specifications and GPU VRAM Requirements