Active Parameters
21B
Context Length
131.072K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
30 Jun 2025
Knowledge Cutoff
Dec 2024
Total Expert Parameters
3.0B
Number of Experts
64
Active Experts
6
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
2560
Number of Layers
28
Attention Heads
20
Key-Value Heads
4
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
The ERNIE-4.5-21B-A3B-Base model is a text-focused Mixture-of-Experts (MoE) transformer and a core component of Baidu's ERNIE 4.5 model family. This specific variant is derived through a process of modality-specific extraction, where text-related parameters are isolated from a larger multimodal pre-training phase that incorporates trillions of tokens. Its architecture is characterized by a heterogeneous MoE structure that supports parameter sharing across modalities during training while maintaining dedicated experts for specific data types. This design ensures that textual representations are not compromised by multimodal joint training, allowing for high-performance natural language understanding and generation in both Chinese and English.
Technically, the model employs a sparse architecture featuring 64 experts per layer, with a routing mechanism that activates 6 experts per token, resulting in approximately 3 billion active parameters per forward pass. This sparsity provides a significant reduction in computational overhead while maintaining the representative capacity of a much larger 21-billion parameter model. The attention mechanism utilizes Grouped-Query Attention (GQA) with 20 query heads and 4 key-value heads, which optimizes memory bandwidth and inference speed. The integration of 2D Rotary Position Embeddings (RoPE) and support for a 131,072-token context window makes it highly effective for processing long-form documents and complex reasoning tasks.
To facilitate efficient deployment, the ERNIE 4.5 family is built on the PaddlePaddle framework and incorporates several hardware-level optimizations, including FP8 mixed-precision training and multi-expert parallel collaboration. The model supports advanced quantization techniques such as 4-bit and 2-bit lossless compression, enabling it to run on diverse hardware platforms with reduced memory requirements. By utilizing modality-isolated routing and specialized router losses, the model achieves high parameter efficiency, making it suitable for industrial-grade applications ranging from sophisticated summarization to cross-modal reasoning within a production environment.
The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.
No evaluation benchmarks for ERNIE-4.5-21B-A3B-Base available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens