Yi-9B: Specifications and GPU VRAM Requirements

Yi-9B

Open Source

Open Weights

Parameters

Context Length

4.096K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

6 Mar 2024

Knowledge Cutoff

Jun 2023

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

Yi-9B

The Yi-9B model is a sophisticated dense transformer-based large language model developed by 01.AI, designed to optimize the trade-off between parameter count and reasoning depth. It serves as a performance-oriented extension of the foundational Yi-6B model, engineered through a process of architectural expansion and multi-stage incremental training. By increasing the model's depth and continuing pre-training on an additional 0.8 trillion high-quality tokens, the developers have produced a model that excels in technical domains such as mathematics and code generation while maintaining robust bilingual fluency in English and Chinese.

Technically, Yi-9B utilizes a decoder-only architecture that mirrors the established Llama framework, enabling immediate compatibility with the broader ecosystem of LLM tools and libraries. Key architectural features include Grouped-Query Attention (GQA) to improve inference throughput and reduce memory overhead, and SwiGLU activation functions within the feed-forward layers for enhanced representational capacity. The model employs Rotary Position Embedding (RoPE) to manage sequence data and utilizes Root Mean Square Layer Normalization (RMSNorm) to stabilize training dynamics across its 44 layers.

Designed for computational efficiency, Yi-9B is particularly suited for deployment in resource-constrained environments, including consumer-grade hardware. Its extensive training on a total of 3.9 trillion tokens provides the model with a strong knowledge base for complex reasoning, reading comprehension, and common-sense logic. This makes it an effective choice for developers building AI-native applications that require a balance of high-performance technical reasoning and efficient local execution.

About Yi

Yi series models are large language models trained from scratch by 01.AI. Bilingual (English/Chinese), featuring strong performance in language understanding, reasoning, and code generation.

Other Yi Models

Evaluation Benchmarks

No evaluation benchmarks for Yi-9B available.

Rankings

Overall Rank

Coding Rank

Model Transparency

Total Score

65 / 100

Upstream

22.0 / 30

Model

24.5 / 40

Downstream

18.5 / 30

Yi-9B Transparency Report

Total Score

/ 100

Audit Note

Yi-9B demonstrates strong transparency regarding its architectural origins and technical specifications, particularly through its detailed technical report. However, it remains opaque concerning training compute resources and specific dataset compositions. While the model is accessible, the use of a custom community license for weights and limited evaluation reproducibility documentation are notable weaknesses.

Upstream

22.0 / 30

Architectural Provenance

8.0 / 10

The Yi-9B model is explicitly documented as a depth-upscaled version of the Yi-6B base model. The technical report and Hugging Face documentation detail its decoder-only Transformer architecture, which is compatible with the Llama framework. It specifies the use of Grouped-Query Attention (GQA), SwiGLU activation functions, Rotary Position Embedding (RoPE), and RMSNorm. The training methodology, involving architectural expansion followed by 0.8 trillion tokens of incremental pre-training, is clearly described in the official 'Yi: Open Foundation Models' paper.

Dataset Composition

5.0 / 10

01.AI discloses that the model was trained on a total of 3.9 trillion tokens (3.1T for the base Yi-6B plus 0.8T for the 9B expansion). While they provide a high-level breakdown of the 3.1T corpus (English and Chinese web data, books, and code) and describe a cascaded filtering pipeline (MinHash deduplication, heuristic rules, and learned filters), they do not provide specific percentage breakdowns or public access to the raw datasets. The 0.8T incremental data is described generally as 'high-quality' with a focus on math and code.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly available via Hugging Face and the official GitHub repository. It uses Byte-Pair Encoding (BPE) via the SentencePiece framework with a documented vocabulary size of 64,000 tokens. Technical details, such as splitting numbers into individual digits and handling rare characters via unicode-byte fallback, are explicitly stated in the technical report. The tokenizer is fully compatible with standard libraries like Hugging Face Transformers.

Model

24.5 / 40

Parameter Density

8.5 / 10

The model's parameter count is precisely stated as 8.83 billion (commonly rounded to 9B). As a dense model, all parameters are active during inference. The technical report provides an architectural breakdown, including the number of layers (44) and the specific attention mechanism (GQA). The impact of depth-upscaling on parameter density is well-documented in the context of its evolution from the 6B variant.

Training Compute

3.0 / 10

While the technical report mentions the use of a 'scalable super-computing infrastructure' and cross-cloud elastic task scheduling, it lacks specific details on the total GPU/TPU hours consumed, the exact hardware models used for the 9B training phase, or the associated carbon footprint. There is no detailed breakdown of the compute costs or environmental impact, which are key requirements for a high score in this category.

Benchmark Reproducibility

4.0 / 10

01.AI provides results for standard benchmarks like MMLU, C-Eval, and GSM8K in their technical report. However, the exact prompts and few-shot examples used for these evaluations are not fully disclosed in a reproducible format. While they mention using zero-shot and few-shot methods, the lack of a public evaluation harness or specific versioning for all benchmarks limits third-party verification. (Score adjusted due to undisclosed contamination risks).

Identity Consistency

9.0 / 10

The model consistently identifies itself as part of the Yi series developed by 01.AI. It maintains clear versioning (e.g., Yi-9B vs Yi-9B-200K) and is transparent about its bilingual focus and technical reasoning capabilities. There are no documented instances of the model claiming to be a competitor's product or misrepresenting its fundamental nature as an AI.

Downstream

18.5 / 30

License Clarity

6.0 / 10

The model weights are released under the 'Yi Series Models Community License Agreement,' which is a custom license. While it allows for free commercial use for many users, it includes restrictions based on user scale (requiring explicit permission for large-scale commercial use) and specific regional legal compliance (PRC laws). This complexity and the distinction between the Apache 2.0 code license and the restrictive weight license create moderate ambiguity for users.

Hardware Footprint

7.5 / 10

01.AI provides clear hardware requirements on their Hugging Face and GitHub pages, including minimum VRAM estimates for different model sizes. They explicitly support and document quantization methods like GPTQ and AWQ, providing scripts for users to perform these optimizations. VRAM requirements for 4-bit and 8-bit versions are generally well-communicated, though detailed context-length memory scaling curves are less prominent.

Versioning Drift

5.0 / 10

The model follows a basic versioning scheme (e.g., Yi-9B, Yi-1.5-9B), and 01.AI maintains a GitHub repository with some update history. However, there is no formal, detailed changelog or semantic versioning system that tracks minor weight updates or behavioral drift over time. Users must rely on date-stamped announcements rather than a structured versioning protocol.

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code