ApX logoApX logo

ERNIE-4.5-0.3B

Parameters

300M

Context Length

131.072K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

30 Jun 2025

Knowledge Cutoff

Dec 2024

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

1024

Number of Layers

18

Attention Heads

16

Key-Value Heads

2

Activation Function

Swish

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

ERNIE-4.5-0.3B

The ERNIE-4.5-0.3B model is a high-efficiency transformer designed to serve as the compact entry point of Baidu's ERNIE 4.5 model family. Engineered for low-latency inference and high-throughput environments, this model prioritizes linguistic proficiency in both Chinese and English while minimizing the computational overhead typical of large-scale foundation models. Its design philosophy balances the need for deep language understanding with the operational realities of edge computing and mobile deployment, providing a versatile solution for real-time text processing.

Technically, ERNIE-4.5-0.3B utilizes a dense transformer architecture featuring 18 layers and a hidden dimension size of 1024. Unlike its larger Mixture-of-Experts counterparts in the same family, this variant activates all its parameters for every token, ensuring consistent performance characteristics and simplified deployment workflows. The model incorporates Grouped-Query Attention (GQA) with 16 query heads and 2 key-value heads to optimize memory usage and speed during long-context generation. It supports an expansive context window of 131,072 tokens, allowing it to process substantial documents and maintain coherence over long-range sequences.

From a performance perspective, ERNIE-4.5-0.3B is optimized for high-speed text completion, sentiment analysis, and on-device conversational agents. It integrates advanced training methodologies from the broader ERNIE 4.5 project, including RMS Normalization and the Swish (SiLU) activation function, which contribute to its training stability and representational power. The model is fully compatible with modern inference engines like vLLM and FastDeploy, and it is released under the Apache 2.0 license to facilitate both academic research and commercial application development within the open-source ecosystem.

About ERNIE 4.5

The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.


Other ERNIE 4.5 Models

Evaluation Benchmarks

No evaluation benchmarks for ERNIE-4.5-0.3B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Transparency

Total Score

B

69 / 100

ERNIE-4.5-0.3B Transparency Report

Total Score

69

/ 100

B

Audit Note

ERNIE-4.5-0.3B exhibits strong transparency in its architectural specifications and licensing, providing clear technical details and a permissive Apache 2.0 license. However, it suffers from significant opacity regarding its training dataset composition and the specific compute resources used for its development. While the model is highly accessible for deployment, its reproducibility is hindered by a lack of detailed evaluation prompts and version tracking.

Upstream

20.0 / 30

Architectural Provenance

8.5 / 10

Baidu provides a comprehensive technical report for the ERNIE 4.5 family, explicitly detailing the ERNIE-4.5-0.3B variant as a dense transformer model. Documentation specifies 18 layers, a hidden dimension of 1024, and 16 query heads with 2 key-value heads (Grouped-Query Attention). The use of RMS Normalization, Swish (SiLU) activation, and absolute position embeddings is clearly stated. The model is built using the PaddlePaddle framework, with PyTorch-compatible weights also released. While the pre-training procedure is described at a high level (multimodal joint pre-training), specific architectural hyperparameters are well-documented.

Dataset Composition

3.5 / 10

Information regarding the training data is limited to general descriptions. The technical report mentions a 'massive multilingual corpus' and 'multimodal joint pre-training' on textual and visual modalities. However, there is no specific breakdown of data sources (e.g., percentages of web, code, or books), no detailed disclosure of filtering or cleaning methodologies, and no access to sample data. The claim of 'specialized Chinese data' is a vague marketing assertion without verifiable composition metrics.

Tokenizer Integrity

8.0 / 10

The tokenizer is publicly available via the Hugging Face repository (tokenization_ernie4_5.py) and is based on SentencePiece. The vocabulary size is explicitly stated as 103,424 tokens. Documentation confirms support for 109 languages, and the tokenizer configuration files are accessible for inspection, allowing for verification of tokenization patterns and special token handling (e.g., [mask:1], <s>, </s>).

Model

27.0 / 40

Parameter Density

9.0 / 10

The parameter count is precisely disclosed as 0.3 billion (specifically 360 million in some technical specs). As a dense model, Baidu explicitly confirms that all parameters are active for every token, distinguishing it from the MoE variants in the same family. The architectural breakdown (layers, hidden size, attention heads) is fully provided, ensuring no ambiguity regarding its density or sparse characteristics.

Training Compute

4.0 / 10

While the technical report provides compute details for the largest 424B model (using 2016 NVIDIA H800 GPUs and achieving 47% MFU), specific compute metrics for the 0.3B variant are absent. There is no disclosure of the specific GPU hours, hardware cluster size, or carbon footprint dedicated to training this specific lightweight model. It mentions 'optimized efficiency' and 'resource-efficient training' without providing the hard data required for a high score.

Benchmark Reproducibility

5.0 / 10

Baidu reports performance on several standard benchmarks (IFEval, Multi-IF, SimpleQA, CMATH) and claims state-of-the-art results for its size class. However, while some evaluation code is available through the ERNIEKit repository, the exact prompts, few-shot examples, and specific benchmark versions used for the 0.3B variant are not fully documented in a way that allows for easy third-party reproduction. Results are often presented in comparison to other models without full transparency on the evaluation harness.

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as part of the ERNIE 4.5 family in documentation and API responses. It maintains a clear distinction between its 'Base' (text completion) and 'PT' (post-trained/instruction) versions. There are no reported instances of the model claiming to be a competitor's product or misrepresenting its 0.3B parameter scale.

Downstream

21.5 / 30

License Clarity

10.0 / 10

The model is released under the Apache License 2.0, which is a standard, permissive open-source license. This is explicitly stated in the technical report, the GitHub repository, and the Hugging Face model card. The license allows for unrestricted commercial use, modification, and distribution, with no conflicting proprietary terms discovered.

Hardware Footprint

7.5 / 10

Hardware requirements are well-documented, with VRAM estimates provided for different precisions (e.g., ~0.6GB for inference). The model supports 4-bit and 2-bit quantization via 'convolutional code quantization,' and memory scaling for its 128k (131,072) context window is addressed. Compatibility with inference engines like vLLM and FastDeploy is confirmed, providing a clear path for deployment on consumer-grade hardware.

Versioning Drift

4.0 / 10

The model uses a basic naming convention (ERNIE-4.5-0.3B-PT) but lacks a formal semantic versioning system or a detailed public changelog for weight updates. While the release date is clear, there is no established infrastructure for tracking behavioral drift or accessing specific historical checkpoints beyond the initial release. Documentation for updates is currently irregular.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
64k
128k

VRAM Required:

Recommended GPUs