ApX logoApX logo

ERNIE-4.5-0.3B-Base

Parameters

300M

Context Length

131.072K

Modality

Text

Architecture

Dense

License

Apache License 2.0

Release Date

30 Jun 2025

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

16

Key-Value Heads

2

Attention Head Dimension

128

Position Embedding

Absolute Position Embedding

RoPE Theta

500,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

Swish

Dimensions

Hidden Dimension Size

1,024

Number of Layers

18

FFN Intermediate Size (Dense)

3,072

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

103,424

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 1k · Context: 131.1k · Vocab: 103.4kx 18 layersRMSNormPre-AttentionMulti-Head Attention16Q / 2KV headsHead dim: 128+RMSNormPre-FFNFeed-Forward NetworkSwishIntermediate: 3.1k+Final RMSNormOutput Logits

ERNIE-4.5-0.3B-Base

The ERNIE-4.5-0.3B-Base model is a constituent of Baidu's ERNIE 4.5 family of foundation models, explicitly engineered for general-purpose text understanding and generation tasks. This variant is characterized by its compact size, featuring 360 million parameters, and a dense architectural design, rendering it suitable for deployment in environments with limited computational resources or for applications requiring a lightweight inference footprint. As an open-source offering under the Apache License 2.0, it provides a foundational language model for developers and researchers to build upon and integrate into various text-centric systems.

From an architectural standpoint, ERNIE-4.5-0.3B-Base implements a transformer structure comprising 18 layers. It utilizes 16 attention heads for queries and 2 key-value heads, indicating a Grouped-Query Attention (GQA) mechanism for efficient processing. The model is trained to support a substantial context length of up to 131,072 tokens, enabling it to process and generate coherent text over extended sequences. Unlike some other variants within the ERNIE 4.5 series, this model employs a dense architecture rather than a Mixture-of-Experts (MoE) structure. The hidden dimension size is 1024, and it employs RMS Normalization and the Swish (SiLU) activation function. The model utilizes an absolute position embedding.

This model is primarily optimized for text completion and can be fine-tuned for specialized applications through various methods, including Supervised Fine-tuning (SFT), Low-Rank Adaptation (LoRA), and Direct Preference Optimization (DPO). Its compatibility with widely adopted frameworks such as Hugging Face Transformers and Baidu's FastDeploy toolkit facilitates its integration into existing development workflows. The model is designed to support both English and Chinese languages.

About ERNIE 4.5

The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.


Other ERNIE 4.5 Models

Evaluation Benchmarks

No evaluation benchmarks for ERNIE-4.5-0.3B-Base available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B+

76 / 100

ERNIE-4.5-0.3B-Base Model Integrity Report

Total Score

76

/ 100

B+

Audit Note

ERNIE-4.5-0.3B-Base demonstrates a high level of transparency regarding its technical architecture and licensing, providing a detailed technical report and a permissive Apache 2.0 license. While it offers excellent clarity on its tokenizer and parameter density, it remains opaque concerning the specific composition of its training data and the total compute resources consumed during its development. The model's accessibility on public hubs and integration with standard toolkits like PaddlePaddle and Transformers further supports its transparency profile.

Upstream

22.0 / 30

Architectural Provenance

8.5 / 10

The model's architecture is extensively documented in the ERNIE 4.5 Technical Report (June 2025). It is a dense transformer-based model with 18 layers, 1024 hidden dimensions, and 16 attention heads. It specifically utilizes Grouped-Query Attention (GQA) with 2 KV heads, RMS Normalization, and the SiLU activation function. The report details its relationship to the broader ERNIE 4.5 family, noting that while it is a dense variant, it benefits from the multimodal heterogeneous pre-training techniques developed for the larger MoE models. The pre-training procedure, including the use of the PaddlePaddle framework and specific optimization techniques like FP8 mixed-precision, is well-documented.

Dataset Composition

4.5 / 10

While the technical report mentions the use of a data manager called REEAO for reproducible data access, the actual composition of the training data is described only in general categories: web pages, academic papers, documents, images, and synthetic data. There is no specific percentage breakdown of these sources (e.g., % code vs % web) or a list of specific datasets used. The report details the filtering and denoising pipeline (heuristic and model-based) but lacks the granular transparency required for a high score in this pillar.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly available via the Hugging Face repository ('baidu/ERNIE-4.5-0.3B-PT') and the official PaddlePaddle GitHub. It uses a SentencePiece-based approach with a clearly stated vocabulary size of 103,424 tokens. Documentation confirms it is optimized for Chinese-English bilingual processing. The vocabulary file (tokenizer.model) and the Python implementation (tokenization_ernie4_5.py) are accessible for inspection, allowing for verification of tokenization behavior and alignment with claimed language support.

Model

30.0 / 40

Parameter Density

9.5 / 10

The parameter count is explicitly stated as 360 million (0.3B). As a dense model, 100% of these parameters are active during inference, which is clearly distinguished from the MoE variants in the same family. The architectural breakdown (layers, heads, hidden size) is fully disclosed in the technical report and model cards, providing high clarity on parameter density and utilization.

Training Compute

5.0 / 10

The technical report provides some high-level compute metrics, such as achieving 47% Model FLOPs Utilization (MFU) and the ability to train on clusters of up to 2,016 NVIDIA H800 GPUs. However, it does not disclose the specific total GPU hours or the carbon footprint for the 0.3B variant specifically. While it mentions the use of the Kunlun chip cluster for training, the lack of specific environmental impact data or exact resource consumption for this specific model limits the score.

Benchmark Reproducibility

6.5 / 10

Baidu provides results for standard benchmarks (MMLU, C-Eval, CMMLU, SimpleQA) in the technical report. Evaluation code is partially available through the ERNIEKit and PaddlePaddle repositories. However, the exact prompts and few-shot examples used for all reported scores are not fully disclosed in a centralized, reproducible format. Third-party verification is beginning to emerge on leaderboards like OpenCompass, but comprehensive independent audits are still limited.

Identity Consistency

9.0 / 10

The model consistently identifies itself as part of the ERNIE 4.5 family. The 'PT' (Pre-trained) and 'Base' designations are clearly used to distinguish it from chat-aligned or distilled versions. There is no evidence of the model claiming to be a competitor's product (e.g., GPT-4), and its versioning is maintained through official Baidu channels and Hugging Face tags.

Downstream

23.5 / 30

License Clarity

10.0 / 10

The model is released under the Apache License 2.0, which is a standard, highly permissive open-source license. The license is explicitly stated in the technical report, the GitHub repository, and the Hugging Face model card. It clearly allows for commercial use, modification, and distribution without the restrictive 'research-only' or 'non-commercial' clauses found in many other Chinese or corporate model releases.

Hardware Footprint

8.0 / 10

Hardware requirements are well-documented. Official documentation specifies VRAM needs for FP16 (~0.75GB for weights) and provides guidance for INT8 and INT4 quantization. It explicitly mentions compatibility with consumer hardware (e.g., RTX 3060) and edge devices. The impact of context length on memory is addressed by the support for up to 128K tokens, with documentation noting the use of FlashMask and RoPE scaling to manage long-context efficiency.

Versioning Drift

5.5 / 10

Baidu uses a versioning system (e.g., ERNIE 4.5 vs 5.0), and the 0.3B model has a clear release date (June 30, 2025). However, there is no detailed public changelog or 'drift' report that tracks minor weight updates or performance changes over time. While the GitHub repository shows commit history, it lacks a formal semantic versioning log for the model weights themselves.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
64k
128k

VRAM Required:

Recommended GPUs

ERNIE-4.5-0.3B-Base: Specifications and GPU VRAM Requirements