ApX 标志

趋近智

Yi-9B

参数

9B

上下文长度

4.096K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

6 Mar 2024

训练数据截止日期

Jun 2023

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

-

层数

44

注意力头

-

键值头

-

激活函数

SwigLU

归一化

-

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Yi-9B

The Yi-9B model, developed by 01.AI, represents an advanced iteration within the Yi model family, an ensemble of open-source large language models. This model is meticulously engineered to deliver enhanced performance across a spectrum of technical domains, including coding, mathematics, and complex reasoning tasks. It maintains strong bilingual proficiency in both English and Chinese, making it suitable for a global user base. The development of Yi-9B builds upon the foundational Yi-6B model through an iterative process involving architectural refinements and extensive multi-stage incremental training on an additional 0.8 trillion tokens, complementing the initial 3.1 trillion tokens utilized for Yi-6B.

Architecturally, Yi-9B is structured as a dense transformer. While drawing inspiration from the established Transformer architecture, similar to models such as Llama, it is an independently trained entity rather than a direct derivative. The model incorporates several key architectural innovations to optimize its performance and efficiency. These include the implementation of Grouped-Query Attention (GQA) for improved processing of attention mechanisms, particularly beneficial for models within its parameter class. Positional encoding is managed through Rotary Position Embedding (RoPE), and the internal layers utilize the SwiGLU activation function, contributing to its overall computational characteristics.

Yi-9B exhibits strong capabilities in areas such as code generation, mathematical problem-solving, common-sense reasoning, and reading comprehension. The comprehensive training regimen, focused on enriching its understanding and generation capabilities in technical domains, positions the model for diverse applications. Its design emphasizes computational efficiency, rendering it suitable for various deployment scenarios, including those on consumer-grade hardware.

关于 Yi

Yi series models are large language models trained from scratch by 01.AI. Bilingual (English/Chinese), featuring strong performance in language understanding, reasoning, and code generation.


其他 Yi 模型

评估基准

排名适用于本地LLM。

排名

#30

基准分数排名

0.54

14

排名

排名

#30

编程排名

#29

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
2k
4k

所需显存:

推荐 GPU

Yi-9B: Specifications and GPU VRAM Requirements