Parameters
34B
Context Length
4K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
2 Nov 2023
Knowledge Cutoff
Jun 2023
Attention
Attention Structure
Multi-Head Attention
Attention Heads
56
Key-Value Heads
8
Attention Head Dimension
-
Position Embedding
Absolute Position Embedding
RoPE Theta
5,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
7,168
Number of Layers
60
FFN Intermediate Size (Dense)
20,480
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
64,000
The Yi-34B model, developed by 01.AI, is a 34-billion parameter large language model trained from scratch on a 3-trillion token multilingual corpus. This foundational model demonstrates strong capabilities in language understanding, commonsense reasoning, and reading comprehension. It is specifically engineered to support both English and Chinese languages, offering robust bilingual proficiency across various tasks. The model's design focuses on achieving a balance between high performance and efficient inference, making it suitable for a range of computational environments.
Architecturally, Yi-34B is built upon a modified decoder-only Transformer framework, drawing inspiration from the LLaMA implementation without being a direct derivative. A key technical feature is the incorporation of Grouped-Query Attention (GQA), which contributes to reduced training and inference costs compared to traditional Multi-Head Attention while maintaining performance. The model utilizes the SwiGLU activation function and RMS Normalization layers. Positional encoding is handled through a Rotary Position Embedding (RoPE) mechanism. These architectural choices aim to optimize model stability, convergence, and compatibility within the AI ecosystem.
Yi-34B is applicable to tasks requiring extensive language processing, such as long-form document summarization, detailed legal and technical document analysis, and complex multilingual question-answering systems. It also excels in the generation of multilingual content and instruction following. The base model supports a context length of 4,096 tokens, with specialized variants like Yi-34B-200K extending this capacity to 200,000 tokens, enabling processing of exceptionally long text sequences. Its design considerations allow for deployment on various hardware configurations, including consumer-grade GPUs, especially when employing quantization techniques.
Rank
#154
| Benchmark | Score | Rank |
|---|---|---|
Web Development WebDev Arena | 1183 | 101 |
General Text Text Arena | 1183 | 102 |
Overall Rank
#154
Coding Rank
#119
Total Score
57
/ 100
Yi-34B demonstrates strong transparency in its technical architecture and hardware requirements, providing clear guidance for local deployment and quantization. However, it suffers from significant opacity regarding its training data sources and compute resources. The model's transparency profile is further complicated by early controversies regarding its architectural naming and potential benchmark contamination, which remain only partially addressed.
Architectural Provenance
The model is documented as a modified decoder-only Transformer. While the technical report claims it was 'trained from scratch,' it acknowledges using the Llama architecture as a base for its implementation. Specific modifications like Grouped-Query Attention (GQA), SwiGLU activation, and Rotary Position Embedding (RoPE) are disclosed. However, the initial release faced significant criticism for renaming Llama's internal tensor names without attribution, which was later corrected to improve compatibility. The 'from scratch' claim is partially undermined by the heavy reliance on Llama's structural design and code logic.
Dataset Composition
01.AI discloses that the model was trained on a 3.1 trillion token bilingual (English/Chinese) corpus. While they mention a 'rigorous pipeline' involving heuristic and learned filters, they provide no specific breakdown of data sources (e.g., percentages of web, code, or books). The methodology for data cleaning is described in general terms in the technical report, but the lack of source-level transparency or sample data availability limits verification.
Tokenizer Integrity
The tokenizer is publicly available via the SentencePiece framework using Byte-Pair Encoding (BPE). The vocabulary size is explicitly stated as 64,000 tokens. Documentation details specific handling of numeric data (splitting into digits) and rare characters (unicode-byte fallback). The tokenizer is accessible for inspection in the official Hugging Face and GitHub repositories, allowing for direct verification of its alignment with the claimed bilingual support.
Parameter Density
The model's total parameter count is clearly stated as 34.4 billion. As a dense model, all parameters are active during inference. Detailed architectural specifications are provided, including 60 layers and a hidden size of 7168. While the breakdown between attention and FFN parameters isn't explicitly tabulated, the structural constants are sufficient for independent calculation.
Training Compute
Information regarding training compute is extremely limited. While the technical report mentions the use of 'robust training infrastructure' and overtraining beyond Chinchilla optimality to 3.1T tokens, it fails to disclose the specific hardware (e.g., number of H100/A100 GPUs), total GPU hours, or the environmental impact/carbon footprint. This lack of detail makes the training cost and resource intensity unverifiable.
Benchmark Reproducibility
The technical report lists performance on standard benchmarks like MMLU, C-Eval, and GSM8K. However, it lacks comprehensive reproduction instructions, exact evaluation prompts, or public evaluation code. Third-party audits have raised concerns about the 'suspiciously high' MMLU scores compared to real-world performance, and independent researchers have noted potential data leakage issues in benchmarks like GSM8K, which 01.AI has not fully addressed with public decontamination logs.
Identity Consistency
The model generally identifies as an AI developed by 01.AI. However, early versions exhibited identity confusion due to the inherited Llama architecture and naming conventions, leading to instances where it was perceived as a Llama derivative rather than an independent model. While versioning (e.g., Yi-1.5) has improved this, the initial lack of clear identity boundaries and the 'oversight' in tensor naming significantly impacted its consistency score.
License Clarity
The model weights are released under the 'Yi Series Models Community License Agreement,' which is a custom license. While it allows for free commercial use, it requires an explicit application for companies with more than 200 million monthly active users. This 'open weights' but not 'open source' (per OSI definitions) approach creates some ambiguity for commercial users, although the terms are generally better documented than proprietary models.
Hardware Footprint
01.AI provides excellent documentation for hardware requirements. They explicitly list VRAM needs for different batch sizes (e.g., 16GB for 4-bit quantization) and provide guidance for running on consumer-grade hardware like the RTX 4090. Quantization impact is documented, and they offer official 4-bit (AWQ) and 8-bit (GPTQ) versions to facilitate deployment.
Versioning Drift
The model family has seen updates (e.g., the transition to Yi-1.5 and the 200K context variants), but a formal, detailed changelog or semantic versioning system is not consistently maintained across all repositories. Users have reported behavioral drift and repetition issues in newer fine-tunes without clear documentation from the provider on what changed in the underlying weights or training mixture.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online