Parameters
9B
Context Length
4.096K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
6 Mar 2024
Knowledge Cutoff
Jun 2023
Attention Structure
Multi-Head Attention
Hidden Dimension Size
4096
Number of Layers
44
Attention Heads
32
Key-Value Heads
4
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
The Yi-9B model is a sophisticated dense transformer-based large language model developed by 01.AI, designed to optimize the trade-off between parameter count and reasoning depth. It serves as a performance-oriented extension of the foundational Yi-6B model, engineered through a process of architectural expansion and multi-stage incremental training. By increasing the model's depth and continuing pre-training on an additional 0.8 trillion high-quality tokens, the developers have produced a model that excels in technical domains such as mathematics and code generation while maintaining robust bilingual fluency in English and Chinese.
Technically, Yi-9B utilizes a decoder-only architecture that mirrors the established Llama framework, enabling immediate compatibility with the broader ecosystem of LLM tools and libraries. Key architectural features include Grouped-Query Attention (GQA) to improve inference throughput and reduce memory overhead, and SwiGLU activation functions within the feed-forward layers for enhanced representational capacity. The model employs Rotary Position Embedding (RoPE) to manage sequence data and utilizes Root Mean Square Layer Normalization (RMSNorm) to stabilize training dynamics across its 44 layers.
Designed for computational efficiency, Yi-9B is particularly suited for deployment in resource-constrained environments, including consumer-grade hardware. Its extensive training on a total of 3.9 trillion tokens provides the model with a strong knowledge base for complex reasoning, reading comprehension, and common-sense logic. This makes it an effective choice for developers building AI-native applications that require a balance of high-performance technical reasoning and efficient local execution.
No evaluation benchmarks for Yi-9B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens