Parameters
300M
Context Length
131.072K
Modality
Text
Architecture
Dense
License
Apache License 2.0
Release Date
30 Jun 2025
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Head Attention
Attention Heads
16
Key-Value Heads
2
Attention Head Dimension
128
Position Embedding
Absolute Position Embedding
RoPE Theta
500,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
Swish
Dimensions
Hidden Dimension Size
1,024
Number of Layers
18
FFN Intermediate Size (Dense)
3,072
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
103,424
The ERNIE-4.5-0.3B-Base model is a constituent of Baidu's ERNIE 4.5 family of foundation models, explicitly engineered for general-purpose text understanding and generation tasks. This variant is characterized by its compact size, featuring 360 million parameters, and a dense architectural design, rendering it suitable for deployment in environments with limited computational resources or for applications requiring a lightweight inference footprint. As an open-source offering under the Apache License 2.0, it provides a foundational language model for developers and researchers to build upon and integrate into various text-centric systems.
From an architectural standpoint, ERNIE-4.5-0.3B-Base implements a transformer structure comprising 18 layers. It utilizes 16 attention heads for queries and 2 key-value heads, indicating a Grouped-Query Attention (GQA) mechanism for efficient processing. The model is trained to support a substantial context length of up to 131,072 tokens, enabling it to process and generate coherent text over extended sequences. Unlike some other variants within the ERNIE 4.5 series, this model employs a dense architecture rather than a Mixture-of-Experts (MoE) structure. The hidden dimension size is 1024, and it employs RMS Normalization and the Swish (SiLU) activation function. The model utilizes an absolute position embedding.
This model is primarily optimized for text completion and can be fine-tuned for specialized applications through various methods, including Supervised Fine-tuning (SFT), Low-Rank Adaptation (LoRA), and Direct Preference Optimization (DPO). Its compatibility with widely adopted frameworks such as Hugging Face Transformers and Baidu's FastDeploy toolkit facilitates its integration into existing development workflows. The model is designed to support both English and Chinese languages.
The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.
No evaluation benchmarks for ERNIE-4.5-0.3B-Base available.
Overall Rank
-
Coding Rank
-
Total Score
76
/ 100
ERNIE-4.5-0.3B-Base demonstrates a high level of transparency regarding its technical architecture and licensing, providing a detailed technical report and a permissive Apache 2.0 license. While it offers excellent clarity on its tokenizer and parameter density, it remains opaque concerning the specific composition of its training data and the total compute resources consumed during its development. The model's accessibility on public hubs and integration with standard toolkits like PaddlePaddle and Transformers further supports its transparency profile.
Architectural Provenance
The model's architecture is extensively documented in the ERNIE 4.5 Technical Report (June 2025). It is a dense transformer-based model with 18 layers, 1024 hidden dimensions, and 16 attention heads. It specifically utilizes Grouped-Query Attention (GQA) with 2 KV heads, RMS Normalization, and the SiLU activation function. The report details its relationship to the broader ERNIE 4.5 family, noting that while it is a dense variant, it benefits from the multimodal heterogeneous pre-training techniques developed for the larger MoE models. The pre-training procedure, including the use of the PaddlePaddle framework and specific optimization techniques like FP8 mixed-precision, is well-documented.
Dataset Composition
While the technical report mentions the use of a data manager called REEAO for reproducible data access, the actual composition of the training data is described only in general categories: web pages, academic papers, documents, images, and synthetic data. There is no specific percentage breakdown of these sources (e.g., % code vs % web) or a list of specific datasets used. The report details the filtering and denoising pipeline (heuristic and model-based) but lacks the granular transparency required for a high score in this pillar.
Tokenizer Integrity
The tokenizer is publicly available via the Hugging Face repository ('baidu/ERNIE-4.5-0.3B-PT') and the official PaddlePaddle GitHub. It uses a SentencePiece-based approach with a clearly stated vocabulary size of 103,424 tokens. Documentation confirms it is optimized for Chinese-English bilingual processing. The vocabulary file (tokenizer.model) and the Python implementation (tokenization_ernie4_5.py) are accessible for inspection, allowing for verification of tokenization behavior and alignment with claimed language support.
Parameter Density
The parameter count is explicitly stated as 360 million (0.3B). As a dense model, 100% of these parameters are active during inference, which is clearly distinguished from the MoE variants in the same family. The architectural breakdown (layers, heads, hidden size) is fully disclosed in the technical report and model cards, providing high clarity on parameter density and utilization.
Training Compute
The technical report provides some high-level compute metrics, such as achieving 47% Model FLOPs Utilization (MFU) and the ability to train on clusters of up to 2,016 NVIDIA H800 GPUs. However, it does not disclose the specific total GPU hours or the carbon footprint for the 0.3B variant specifically. While it mentions the use of the Kunlun chip cluster for training, the lack of specific environmental impact data or exact resource consumption for this specific model limits the score.
Benchmark Reproducibility
Baidu provides results for standard benchmarks (MMLU, C-Eval, CMMLU, SimpleQA) in the technical report. Evaluation code is partially available through the ERNIEKit and PaddlePaddle repositories. However, the exact prompts and few-shot examples used for all reported scores are not fully disclosed in a centralized, reproducible format. Third-party verification is beginning to emerge on leaderboards like OpenCompass, but comprehensive independent audits are still limited.
Identity Consistency
The model consistently identifies itself as part of the ERNIE 4.5 family. The 'PT' (Pre-trained) and 'Base' designations are clearly used to distinguish it from chat-aligned or distilled versions. There is no evidence of the model claiming to be a competitor's product (e.g., GPT-4), and its versioning is maintained through official Baidu channels and Hugging Face tags.
License Clarity
The model is released under the Apache License 2.0, which is a standard, highly permissive open-source license. The license is explicitly stated in the technical report, the GitHub repository, and the Hugging Face model card. It clearly allows for commercial use, modification, and distribution without the restrictive 'research-only' or 'non-commercial' clauses found in many other Chinese or corporate model releases.
Hardware Footprint
Hardware requirements are well-documented. Official documentation specifies VRAM needs for FP16 (~0.75GB for weights) and provides guidance for INT8 and INT4 quantization. It explicitly mentions compatibility with consumer hardware (e.g., RTX 3060) and edge devices. The impact of context length on memory is addressed by the support for up to 128K tokens, with documentation noting the use of FlashMask and RoPE scaling to manage long-context efficiency.
Versioning Drift
Baidu uses a versioning system (e.g., ERNIE 4.5 vs 5.0), and the 0.3B model has a clear release date (June 30, 2025). However, there is no detailed public changelog or 'drift' report that tracks minor weight updates or performance changes over time. While the GitHub repository shows commit history, it lacks a formal semantic versioning log for the model weights themselves.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online