ApX logo

Yi-9B

Parameters

9B

Context Length

4.096K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

6 Mar 2024

Knowledge Cutoff

Jun 2023

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

-

Number of Layers

44

Attention Heads

-

Key-Value Heads

-

Activation Function

SwigLU

Normalization

-

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Yi-9B

The Yi-9B model is an advanced large language model developed by 01.AI, designed to enhance performance across coding, mathematics, and reasoning tasks, while maintaining robust bilingual capabilities in English and Chinese. It is a key member of the Yi model family, which comprises open-source language models meticulously trained from scratch by 01.AI on an extensive multilingual corpus. This iterative development builds upon the foundation of the Yi-6B model, integrating architectural refinements and extensive multi-stage incremental training to optimize its capabilities.

Architecturally, Yi-9B employs a dense transformer structure. While it shares foundational principles with the Transformer architecture, similar to Llama models, it is not a direct derivative but rather an independently trained model. Key architectural innovations include the utilization of Grouped-Query Attention (GQA) for improved efficiency, especially pertinent for models in its parameter class. Positional encoding is managed through Rotary Position Embedding (RoPE), and the model incorporates the SwiGLU activation function within its layers, contributing to its performance characteristics.

The model's training regimen involved an initial expansion from Yi-6B, achieved through a method of depth increase, followed by multi-stage incremental training on an additional 0.8 trillion tokens. This rigorous training process, complementing the 3.1 trillion tokens used for Yi-6B, focused on enriching its understanding and generation capabilities in technical domains. Yi-9B demonstrates strong performance in areas such as code generation, mathematical problem-solving, common-sense reasoning, and reading comprehension. Its design emphasizes computational efficiency, making it suitable for a variety of deployment scenarios, including on consumer-grade hardware.

About Yi

Yi series models are large language models trained from scratch by 01.AI. Bilingual (English/Chinese), featuring strong performance in language understanding, reasoning, and code generation.


Other Yi Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Yi-9B available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
2k
4k

VRAM Required:

Recommended GPUs