Active Parameters
21B
Context Length
131.072K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
30 Jun 2025
Knowledge Cutoff
Dec 2024
Total Expert Parameters
3.0B
Number of Experts
64
Active Experts
6
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
-
Number of Layers
28
Attention Heads
20
Key-Value Heads
4
Activation Function
-
Normalization
-
Position Embedding
Absolute Position Embedding
ERNIE-4.5-21B-A3B is a high-efficiency large language model belonging to Baidu's ERNIE 4.5 family, specifically engineered for advanced text understanding and complex reasoning tasks. As a Mixture-of-Experts (MoE) model, it maintains a massive 21 billion total parameter count while activating only 3 billion parameters per token. This architectural strategy allows the model to achieve performance levels typical of larger systems while maintaining a computational footprint suitable for agile deployment. The model is part of a broader multimodal lineage but this specific variant is post-trained to excel in natural language processing, logical deduction, and structured tool usage.
The technical backbone of ERNIE-4.5-21B-A3B utilizes a fine-grained heterogeneous MoE structure designed to mitigate cross-modal interference during initial pre-training. It employs 64 experts per layer, with a routing mechanism that selects 6 active experts per token alongside 2 shared experts that facilitate global knowledge integration. The architecture incorporates Grouped-Query Attention (GQA) for optimized memory throughput and employs Rotary Position Embeddings (RoPE) with a progressive frequency scaling method. This scaling allows the model to natively support a 131,072-token context window, making it effective for processing long-form documentation and multi-step reasoning chains without the degradation often seen in context-extended models.
Optimized for production-grade environments, the model supports advanced quantization techniques including 4-bit and 2-bit convolutional code quantization, which minimizes memory requirements for inference. The training infrastructure leverages FP8 mixed-precision and hierarchical load balancing to ensure expert stability and high throughput. Designed to be interoperable across deep learning ecosystems, ERNIE-4.5-21B-A3B is compatible with the PaddlePaddle framework and provides PyTorch-formatted weights for integration into standard Transformers-based pipelines. Its capabilities are further extended by its native support for function calling and structured data interaction, making it a viable foundation for agentic workflows and automated technical tasks.
The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.
No evaluation benchmarks for ERNIE-4.5-21B-A3B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens