Parameters
300M
Context Length
131.072K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
30 Jun 2025
Knowledge Cutoff
Dec 2024
Attention Structure
Multi-Head Attention
Hidden Dimension Size
1024
Number of Layers
18
Attention Heads
16
Key-Value Heads
2
Activation Function
Swish
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
The ERNIE-4.5-0.3B model is a high-efficiency transformer designed to serve as the compact entry point of Baidu's ERNIE 4.5 model family. Engineered for low-latency inference and high-throughput environments, this model prioritizes linguistic proficiency in both Chinese and English while minimizing the computational overhead typical of large-scale foundation models. Its design philosophy balances the need for deep language understanding with the operational realities of edge computing and mobile deployment, providing a versatile solution for real-time text processing.
Technically, ERNIE-4.5-0.3B utilizes a dense transformer architecture featuring 18 layers and a hidden dimension size of 1024. Unlike its larger Mixture-of-Experts counterparts in the same family, this variant activates all its parameters for every token, ensuring consistent performance characteristics and simplified deployment workflows. The model incorporates Grouped-Query Attention (GQA) with 16 query heads and 2 key-value heads to optimize memory usage and speed during long-context generation. It supports an expansive context window of 131,072 tokens, allowing it to process substantial documents and maintain coherence over long-range sequences.
From a performance perspective, ERNIE-4.5-0.3B is optimized for high-speed text completion, sentiment analysis, and on-device conversational agents. It integrates advanced training methodologies from the broader ERNIE 4.5 project, including RMS Normalization and the Swish (SiLU) activation function, which contribute to its training stability and representational power. The model is fully compatible with modern inference engines like vLLM and FastDeploy, and it is released under the Apache 2.0 license to facilitate both academic research and commercial application development within the open-source ecosystem.
The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.
No evaluation benchmarks for ERNIE-4.5-0.3B available.
Overall Rank
-
Coding Rank
-
Total Score
69
/ 100
ERNIE-4.5-0.3B exhibits strong transparency in its architectural specifications and licensing, providing clear technical details and a permissive Apache 2.0 license. However, it suffers from significant opacity regarding its training dataset composition and the specific compute resources used for its development. While the model is highly accessible for deployment, its reproducibility is hindered by a lack of detailed evaluation prompts and version tracking.
Architectural Provenance
Baidu provides a comprehensive technical report for the ERNIE 4.5 family, explicitly detailing the ERNIE-4.5-0.3B variant as a dense transformer model. Documentation specifies 18 layers, a hidden dimension of 1024, and 16 query heads with 2 key-value heads (Grouped-Query Attention). The use of RMS Normalization, Swish (SiLU) activation, and absolute position embeddings is clearly stated. The model is built using the PaddlePaddle framework, with PyTorch-compatible weights also released. While the pre-training procedure is described at a high level (multimodal joint pre-training), specific architectural hyperparameters are well-documented.
Dataset Composition
Information regarding the training data is limited to general descriptions. The technical report mentions a 'massive multilingual corpus' and 'multimodal joint pre-training' on textual and visual modalities. However, there is no specific breakdown of data sources (e.g., percentages of web, code, or books), no detailed disclosure of filtering or cleaning methodologies, and no access to sample data. The claim of 'specialized Chinese data' is a vague marketing assertion without verifiable composition metrics.
Tokenizer Integrity
The tokenizer is publicly available via the Hugging Face repository (tokenization_ernie4_5.py) and is based on SentencePiece. The vocabulary size is explicitly stated as 103,424 tokens. Documentation confirms support for 109 languages, and the tokenizer configuration files are accessible for inspection, allowing for verification of tokenization patterns and special token handling (e.g., [mask:1], <s>, </s>).
Parameter Density
The parameter count is precisely disclosed as 0.3 billion (specifically 360 million in some technical specs). As a dense model, Baidu explicitly confirms that all parameters are active for every token, distinguishing it from the MoE variants in the same family. The architectural breakdown (layers, hidden size, attention heads) is fully provided, ensuring no ambiguity regarding its density or sparse characteristics.
Training Compute
While the technical report provides compute details for the largest 424B model (using 2016 NVIDIA H800 GPUs and achieving 47% MFU), specific compute metrics for the 0.3B variant are absent. There is no disclosure of the specific GPU hours, hardware cluster size, or carbon footprint dedicated to training this specific lightweight model. It mentions 'optimized efficiency' and 'resource-efficient training' without providing the hard data required for a high score.
Benchmark Reproducibility
Baidu reports performance on several standard benchmarks (IFEval, Multi-IF, SimpleQA, CMATH) and claims state-of-the-art results for its size class. However, while some evaluation code is available through the ERNIEKit repository, the exact prompts, few-shot examples, and specific benchmark versions used for the 0.3B variant are not fully documented in a way that allows for easy third-party reproduction. Results are often presented in comparison to other models without full transparency on the evaluation harness.
Identity Consistency
The model demonstrates high identity consistency, correctly identifying itself as part of the ERNIE 4.5 family in documentation and API responses. It maintains a clear distinction between its 'Base' (text completion) and 'PT' (post-trained/instruction) versions. There are no reported instances of the model claiming to be a competitor's product or misrepresenting its 0.3B parameter scale.
License Clarity
The model is released under the Apache License 2.0, which is a standard, permissive open-source license. This is explicitly stated in the technical report, the GitHub repository, and the Hugging Face model card. The license allows for unrestricted commercial use, modification, and distribution, with no conflicting proprietary terms discovered.
Hardware Footprint
Hardware requirements are well-documented, with VRAM estimates provided for different precisions (e.g., ~0.6GB for inference). The model supports 4-bit and 2-bit quantization via 'convolutional code quantization,' and memory scaling for its 128k (131,072) context window is addressed. Compatibility with inference engines like vLLM and FastDeploy is confirmed, providing a clear path for deployment on consumer-grade hardware.
Versioning Drift
The model uses a basic naming convention (ERNIE-4.5-0.3B-PT) but lacks a formal semantic versioning system or a detailed public changelog for weight updates. While the release date is clear, there is no established infrastructure for tracking behavioral drift or accessing specific historical checkpoints beyond the initial release. Documentation for updates is currently irregular.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens