ApX logo

Yi-6B

Parameters

6B

Context Length

4.096K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

2 Nov 2023

Knowledge Cutoff

Jun 2023

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

-

Number of Layers

-

Attention Heads

-

Key-Value Heads

-

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Yi-6B

The Yi-6B model, developed by 01.AI, is a 6-billion parameter large language model designed for efficient and accessible language processing tasks. It is part of the Yi model family, engineered to offer substantial performance while maintaining moderate resource requirements, making it suitable for both personal and academic applications. The model is distinguished by its bilingual capabilities, having been trained on an expansive 3-trillion token multilingual corpus, enabling proficiency in both English and Chinese language understanding and generation.

Architecturally, Yi-6B is built upon a dense transformer framework, diverging from Mixture-of-Experts (MoE) designs. Its attention mechanism incorporates Grouped-Query Attention (GQA), a modification applied to both the 6B and 34B Yi models. This approach is known to reduce training and inference costs compared to traditional Multi-Head Attention (MHA) without compromising performance on smaller models. The model employs SwiGLU as its activation function and RMSNorm for normalization, drawing parallels with the architectural advancements seen in models like Llama, which the Yi series is often compared to for its foundational structure. Its positional embeddings leverage the Rotary Positional Embedding (RoPE) scheme, facilitating effective context management.

The Yi-6B model is engineered for robust performance across a spectrum of natural language processing tasks, including language understanding, commonsense reasoning, and reading comprehension. Its efficient design and open-weight release under the Apache 2.0 license contribute to its applicability in various scenarios, from rapid prototyping in real-time applications to fine-tuning for specific domains. The model features a default context window of 4,096 tokens, with variants offering extended context lengths up to 200,000 tokens for handling more extensive textual inputs.

About Yi

Yi series models are large language models trained from scratch by 01.AI. Bilingual (English/Chinese), featuring strong performance in language understanding, reasoning, and code generation.


Other Yi Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Yi-6B available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
2k
4k

VRAM Required:

Recommended GPUs