Yi-34B: Specifications and GPU VRAM Requirements

Yi-34B

Open Source

Open Weights

Parameters

34B

Context Length

4.096K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

2 Nov 2023

Knowledge Cutoff

Jun 2023

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

7168

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

Yi-34B

The Yi-34B model, developed by 01.AI, is a 34-billion parameter large language model trained from scratch on a 3-trillion token multilingual corpus. This foundational model demonstrates strong capabilities in language understanding, commonsense reasoning, and reading comprehension. It is specifically engineered to support both English and Chinese languages, offering robust bilingual proficiency across various tasks. The model's design focuses on achieving a balance between high performance and efficient inference, making it suitable for a range of computational environments.

Architecturally, Yi-34B is built upon a modified decoder-only Transformer framework, drawing inspiration from the LLaMA implementation without being a direct derivative. A key technical feature is the incorporation of Grouped-Query Attention (GQA), which contributes to reduced training and inference costs compared to traditional Multi-Head Attention while maintaining performance. The model utilizes the SwiGLU activation function and RMS Normalization layers. Positional encoding is handled through a Rotary Position Embedding (RoPE) mechanism. These architectural choices aim to optimize model stability, convergence, and compatibility within the AI ecosystem.

Yi-34B is applicable to tasks requiring extensive language processing, such as long-form document summarization, detailed legal and technical document analysis, and complex multilingual question-answering systems. It also excels in the generation of multilingual content and instruction following. The base model supports a context length of 4,096 tokens, with specialized variants like Yi-34B-200K extending this capacity to 200,000 tokens, enabling processing of exceptionally long text sequences. Its design considerations allow for deployment on various hardware configurations, including consumer-grade GPUs, especially when employing quantization techniques.

About Yi

Yi series models are large language models trained from scratch by 01.AI. Bilingual (English/Chinese), featuring strong performance in language understanding, reasoning, and code generation.

Other Yi Models

Evaluation Benchmarks

Rank

#88

Benchmark	Score	Rank
Web Development WebDev Arena	1184	57

Rankings

Overall Rank

#88

Coding Rank

#84

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code