ApX logoApX logo

Yi-34B

Parameters

34B

Context Length

4K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

2 Nov 2023

Knowledge Cutoff

Jun 2023

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

56

Key-Value Heads

8

Attention Head Dimension

-

Position Embedding

Absolute Position Embedding

RoPE Theta

5,000,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

7,168

Number of Layers

60

FFN Intermediate Size (Dense)

20,480

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

64,000

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 7.2k · Context: 4K · Vocab: 64kx 60 layersRMSNormPre-AttentionMulti-Head Attention56Q / 8KV headsHead dim: 128+RMSNormPre-FFNFeed-Forward NetworkSwiGLUIntermediate: 20.5k+Final RMSNormOutput Logits

Yi-34B

The Yi-34B model, developed by 01.AI, is a 34-billion parameter large language model trained from scratch on a 3-trillion token multilingual corpus. This foundational model demonstrates strong capabilities in language understanding, commonsense reasoning, and reading comprehension. It is specifically engineered to support both English and Chinese languages, offering robust bilingual proficiency across various tasks. The model's design focuses on achieving a balance between high performance and efficient inference, making it suitable for a range of computational environments.

Architecturally, Yi-34B is built upon a modified decoder-only Transformer framework, drawing inspiration from the LLaMA implementation without being a direct derivative. A key technical feature is the incorporation of Grouped-Query Attention (GQA), which contributes to reduced training and inference costs compared to traditional Multi-Head Attention while maintaining performance. The model utilizes the SwiGLU activation function and RMS Normalization layers. Positional encoding is handled through a Rotary Position Embedding (RoPE) mechanism. These architectural choices aim to optimize model stability, convergence, and compatibility within the AI ecosystem.

Yi-34B is applicable to tasks requiring extensive language processing, such as long-form document summarization, detailed legal and technical document analysis, and complex multilingual question-answering systems. It also excels in the generation of multilingual content and instruction following. The base model supports a context length of 4,096 tokens, with specialized variants like Yi-34B-200K extending this capacity to 200,000 tokens, enabling processing of exceptionally long text sequences. Its design considerations allow for deployment on various hardware configurations, including consumer-grade GPUs, especially when employing quantization techniques.

About Yi

Yi series models are large language models trained from scratch by 01.AI. Bilingual (English/Chinese), featuring strong performance in language understanding, reasoning, and code generation.


Other Yi Models

Evaluation Benchmarks

Rank

#154

BenchmarkScoreRank

Web Development

WebDev Arena

1183

101

General Text

Text Arena

1183

102

Rankings

Overall Rank

#154

Coding Rank

#119

Model Integrity

Total Score

C+

57 / 100

Yi-34B Model Integrity Report

Total Score

57

/ 100

C+

Audit Note

Yi-34B demonstrates strong transparency in its technical architecture and hardware requirements, providing clear guidance for local deployment and quantization. However, it suffers from significant opacity regarding its training data sources and compute resources. The model's transparency profile is further complicated by early controversies regarding its architectural naming and potential benchmark contamination, which remain only partially addressed.

Upstream

18.5 / 30

Architectural Provenance

6.0 / 10

The model is documented as a modified decoder-only Transformer. While the technical report claims it was 'trained from scratch,' it acknowledges using the Llama architecture as a base for its implementation. Specific modifications like Grouped-Query Attention (GQA), SwiGLU activation, and Rotary Position Embedding (RoPE) are disclosed. However, the initial release faced significant criticism for renaming Llama's internal tensor names without attribution, which was later corrected to improve compatibility. The 'from scratch' claim is partially undermined by the heavy reliance on Llama's structural design and code logic.

Dataset Composition

4.0 / 10

01.AI discloses that the model was trained on a 3.1 trillion token bilingual (English/Chinese) corpus. While they mention a 'rigorous pipeline' involving heuristic and learned filters, they provide no specific breakdown of data sources (e.g., percentages of web, code, or books). The methodology for data cleaning is described in general terms in the technical report, but the lack of source-level transparency or sample data availability limits verification.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly available via the SentencePiece framework using Byte-Pair Encoding (BPE). The vocabulary size is explicitly stated as 64,000 tokens. Documentation details specific handling of numeric data (splitting into digits) and rare characters (unicode-byte fallback). The tokenizer is accessible for inspection in the official Hugging Face and GitHub repositories, allowing for direct verification of its alignment with the claimed bilingual support.

Model

19.0 / 40

Parameter Density

7.0 / 10

The model's total parameter count is clearly stated as 34.4 billion. As a dense model, all parameters are active during inference. Detailed architectural specifications are provided, including 60 layers and a hidden size of 7168. While the breakdown between attention and FFN parameters isn't explicitly tabulated, the structural constants are sufficient for independent calculation.

Training Compute

2.0 / 10

Information regarding training compute is extremely limited. While the technical report mentions the use of 'robust training infrastructure' and overtraining beyond Chinchilla optimality to 3.1T tokens, it fails to disclose the specific hardware (e.g., number of H100/A100 GPUs), total GPU hours, or the environmental impact/carbon footprint. This lack of detail makes the training cost and resource intensity unverifiable.

Benchmark Reproducibility

4.0 / 10

The technical report lists performance on standard benchmarks like MMLU, C-Eval, and GSM8K. However, it lacks comprehensive reproduction instructions, exact evaluation prompts, or public evaluation code. Third-party audits have raised concerns about the 'suspiciously high' MMLU scores compared to real-world performance, and independent researchers have noted potential data leakage issues in benchmarks like GSM8K, which 01.AI has not fully addressed with public decontamination logs.

Identity Consistency

6.0 / 10

The model generally identifies as an AI developed by 01.AI. However, early versions exhibited identity confusion due to the inherited Llama architecture and naming conventions, leading to instances where it was perceived as a Llama derivative rather than an independent model. While versioning (e.g., Yi-1.5) has improved this, the initial lack of clear identity boundaries and the 'oversight' in tensor naming significantly impacted its consistency score.

Downstream

19.0 / 30

License Clarity

6.0 / 10

The model weights are released under the 'Yi Series Models Community License Agreement,' which is a custom license. While it allows for free commercial use, it requires an explicit application for companies with more than 200 million monthly active users. This 'open weights' but not 'open source' (per OSI definitions) approach creates some ambiguity for commercial users, although the terms are generally better documented than proprietary models.

Hardware Footprint

8.0 / 10

01.AI provides excellent documentation for hardware requirements. They explicitly list VRAM needs for different batch sizes (e.g., 16GB for 4-bit quantization) and provide guidance for running on consumer-grade hardware like the RTX 4090. Quantization impact is documented, and they offer official 4-bit (AWQ) and 8-bit (GPTQ) versions to facilitate deployment.

Versioning Drift

5.0 / 10

The model family has seen updates (e.g., the transition to Yi-1.5 and the 200K context variants), but a formal, detailed changelog or semantic versioning system is not consistently maintained across all repositories. Users have reported behavioral drift and repetition issues in newer fine-tunes without clear documentation from the provider on what changed in the underlying weights or training mixture.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
2k
4k

VRAM Required:

Recommended GPUs

Yi-34B: Specifications and GPU VRAM Requirements