Yi-6B：规格和 GPU 显存要求

Yi-6B

开源

开放权重

参数

上下文长度

4.096K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

2 Nov 2023

训练数据截止日期

Jun 2023

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

4096

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

Yi-6B

The Yi-6B model, developed by 01.AI, is a 6-billion parameter large language model engineered for efficient and accessible language processing tasks. It is a core component of the Yi model family, designed to offer substantial performance while maintaining moderate resource requirements, making it suitable for both personal and academic applications. The model is distinguished by its bilingual capabilities, having been trained on an expansive 3-trillion token multilingual corpus, enabling proficiency in both English and Chinese language understanding and generation.

Architecturally, Yi-6B is built upon a dense transformer framework. Its attention mechanism incorporates Grouped-Query Attention (GQA), a modification applied to both the 6B and 34B Yi models. This approach is known to reduce training and inference costs compared to traditional Multi-Head Attention without compromising performance on smaller models. The model employs SwiGLU as its activation function and RMSNorm for normalization, drawing architectural parallels with models such as Llama. Its positional embeddings leverage the Rotary Positional Embedding (RoPE) scheme, facilitating effective context management. The Yi-6B model features a hidden dimension size of 4096, comprises 32 layers, and utilizes 32 attention query heads alongside 4 key-value heads.

The Yi-6B model is engineered for robust performance across a spectrum of natural language processing tasks, including language understanding, commonsense reasoning, and reading comprehension. Its efficient design and open-weight release under the Apache 2.0 license contribute to its applicability in various scenarios, from rapid prototyping in real-time applications to fine-tuning for specific domains. The model features a default context window of 4,096 tokens, with variants offering extended context lengths up to 200,000 tokens for handling more extensive textual inputs.

关于 Yi

Yi series models are large language models trained from scratch by 01.AI. Bilingual (English/Chinese), featuring strong performance in language understanding, reasoning, and code generation.

其他 Yi 模型

评估基准

没有可用的 Yi-6B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Yi-6B

技术规格

Yi-6B

关于 Yi

其他 Yi 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源