Yi-6B: Specifications and GPU VRAM Requirements

Yi-6B

Open Source

Open Weights

Parameters

Context Length

4.096K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

2 Nov 2023

Knowledge Cutoff

Jun 2023

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

Yi-6B

The Yi-6B model, developed by 01.AI, is a 6-billion parameter large language model engineered for efficient and accessible language processing tasks. It is a core component of the Yi model family, designed to offer substantial performance while maintaining moderate resource requirements, making it suitable for both personal and academic applications. The model is distinguished by its bilingual capabilities, having been trained on an expansive 3-trillion token multilingual corpus, enabling proficiency in both English and Chinese language understanding and generation.

Architecturally, Yi-6B is built upon a dense transformer framework. Its attention mechanism incorporates Grouped-Query Attention (GQA), a modification applied to both the 6B and 34B Yi models. This approach is known to reduce training and inference costs compared to traditional Multi-Head Attention without compromising performance on smaller models. The model employs SwiGLU as its activation function and RMSNorm for normalization, drawing architectural parallels with models such as Llama. Its positional embeddings leverage the Rotary Positional Embedding (RoPE) scheme, facilitating effective context management. The Yi-6B model features a hidden dimension size of 4096, comprises 32 layers, and utilizes 32 attention query heads alongside 4 key-value heads.

The Yi-6B model is engineered for robust performance across a spectrum of natural language processing tasks, including language understanding, commonsense reasoning, and reading comprehension. Its efficient design and open-weight release under the Apache 2.0 license contribute to its applicability in various scenarios, from rapid prototyping in real-time applications to fine-tuning for specific domains. The model features a default context window of 4,096 tokens, with variants offering extended context lengths up to 200,000 tokens for handling more extensive textual inputs.

About Yi

Yi series models are large language models trained from scratch by 01.AI. Bilingual (English/Chinese), featuring strong performance in language understanding, reasoning, and code generation.

Other Yi Models

Evaluation Benchmarks

No evaluation benchmarks for Yi-6B available.

Rankings

Overall Rank

Coding Rank

Model Transparency

Total Score

C+

60 / 100

Upstream

19.5 / 30

Model

21.0 / 40

Downstream

19.0 / 30

Yi-6B Transparency Report

Total Score

/ 100

C+

Audit Note

The Yi-6B model exhibits strong transparency in its architectural specifications and hardware requirements, supported by a formal technical report. However, it suffers from significant opacity regarding its training compute resources and the granular composition of its 3.1T token dataset. While the use of an Apache 2.0 license is a positive step, conflicting commercial application requirements and early-release naming inconsistencies have historically clouded its transparency profile.

Upstream

19.5 / 30

Architectural Provenance

7.0 / 10

The Yi-6B model is well-documented in an official technical report ('Yi: Open Foundation Models') which details its 'modified' Transformer architecture. It explicitly identifies the use of Grouped-Query Attention (GQA), SwiGLU activation, and Rotary Positional Embeddings (RoPE). While it acknowledges being based on the Llama implementation, it clarifies that it was trained from scratch. However, the 'proprietary' nature of the specific training infrastructure and some methodology details are not fully public, preventing a higher score.

Dataset Composition

4.0 / 10

01.AI discloses that the model was trained on a 3.1 trillion token multilingual corpus (primarily English and Chinese). While the technical report describes a 'cascaded data deduplication and quality filtering pipeline' involving heuristic and learned filters, it lacks a detailed percentage breakdown of data sources (e.g., specific web crawls, books, or code proportions). The data itself is not public, and the description remains at a high level of 'highly-engineered' data without granular source transparency.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly accessible via the Hugging Face repository and the official GitHub. It uses a SentencePiece BPE implementation with a vocabulary size of 64,000 tokens. Documentation explains the choice to avoid dummy prefixes for better bilingual (English/Chinese) performance and the use of an 'identity tokenizer' for punctuation. The vocabulary is well-aligned with the claimed bilingual support.

Model

21.0 / 40

Parameter Density

8.0 / 10

The model's parameter count is clearly stated as 6 billion. As a dense model, all parameters are active during inference. The technical report provides a specific architectural breakdown: 32 layers, a hidden size of 4096, 32 query heads, and 4 KV heads. This level of detail is exemplary for a dense architecture.

Training Compute

2.0 / 10

Information regarding the specific compute resources used for training Yi-6B is extremely limited. While the report mentions a 'scalable super-computing infrastructure,' it does not disclose the total GPU/TPU hours, the specific hardware count used for the 6B variant, the training duration, or the carbon footprint. This is a significant transparency gap.

Benchmark Reproducibility

5.0 / 10

01.AI provides benchmark results on standard sets like MMLU, C-Eval, and AlpacaEval in their technical report. They mention following Llama 2's evaluation methodology and using greedy decoding. However, the exact evaluation code and full prompt sets used for all internal benchmarks are not fully public in a single reproducible repository, and independent verification has noted sensitivity to prompt formatting.

Identity Consistency

6.0 / 10

The model generally identifies as an AI developed by 01.AI in its chat variants. However, research indicates that the base model (Yi-6B) lacks inherent self-identity without fine-tuning, and there have been documented instances of identity confusion in the broader Yi family where models might misidentify their origin or version when prompted in specific languages or contexts.

Downstream

19.0 / 30

License Clarity

6.0 / 10

The model weights and code are released under the Apache 2.0 license, which is highly transparent. However, 01.AI's official website and some documentation include a requirement to 'apply for a commercial license for free' for certain use cases, creating a conflict with the standard 'unrestricted' nature of Apache 2.0. This ambiguity in the commercial terms reduces the score.

Hardware Footprint

8.0 / 10

Hardware requirements are well-documented on the Hugging Face model card and in the GitHub README. It provides specific VRAM estimates for inference (approx. 12GB for FP16) and training (approx. 45GB with Adam). Furthermore, it details requirements for 4-bit and 8-bit quantized versions (AWQ/GPTQ), making it highly accessible for users to plan deployment.

Versioning Drift

5.0 / 10

The model follows a basic versioning scheme (e.g., Yi-6B, Yi-1.5-6B), and 01.AI maintains a changelog on GitHub and Hugging Face. However, the versioning does not strictly follow semantic versioning for weights, and some updates (like the 200K context extension) were released as separate variants rather than versioned iterations of the base, making tracking of 'drift' in the original model difficult.

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code