Parameters
9B
Context Length
4.096K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
6 Mar 2024
Knowledge Cutoff
Jun 2023
Attention Structure
Multi-Head Attention
Hidden Dimension Size
-
Number of Layers
44
Attention Heads
-
Key-Value Heads
-
Activation Function
SwigLU
Normalization
-
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
The Yi-9B model, developed by 01.AI, represents an advanced iteration within the Yi model family, an ensemble of open-source large language models. This model is meticulously engineered to deliver enhanced performance across a spectrum of technical domains, including coding, mathematics, and complex reasoning tasks. It maintains strong bilingual proficiency in both English and Chinese, making it suitable for a global user base. The development of Yi-9B builds upon the foundational Yi-6B model through an iterative process involving architectural refinements and extensive multi-stage incremental training on an additional 0.8 trillion tokens, complementing the initial 3.1 trillion tokens utilized for Yi-6B.
Architecturally, Yi-9B is structured as a dense transformer. While drawing inspiration from the established Transformer architecture, similar to models such as Llama, it is an independently trained entity rather than a direct derivative. The model incorporates several key architectural innovations to optimize its performance and efficiency. These include the implementation of Grouped-Query Attention (GQA) for improved processing of attention mechanisms, particularly beneficial for models within its parameter class. Positional encoding is managed through Rotary Position Embedding (RoPE), and the internal layers utilize the SwiGLU activation function, contributing to its overall computational characteristics.
Yi-9B exhibits strong capabilities in areas such as code generation, mathematical problem-solving, common-sense reasoning, and reading comprehension. The comprehensive training regimen, focused on enriching its understanding and generation capabilities in technical domains, positions the model for diverse applications. Its design emphasizes computational efficiency, rendering it suitable for various deployment scenarios, including those on consumer-grade hardware.
Ranking is for Local LLMs.
Rank
#30
| Benchmark | Score | Rank |
|---|---|---|
Coding Aider Coding | 0.54 | 14 |
Overall Rank
#30
Coding Rank
#29
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens