Active Parameters
21B
Context Length
131K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
30 Jun 2025
Knowledge Cutoff
Dec 2024
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
20
Key-Value Heads
4
Attention Head Dimension
-
Position Embedding
Absolute Position Embedding
RoPE Theta
500,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
Swish
Dimensions
Hidden Dimension Size
2,560
Number of Layers
28
FFN Intermediate Size (Dense)
1,536
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
103,424
Mixture of Experts
Total Expert Parameters
3.0B
Number of Experts
64
Active Experts
6
Shared Experts
2
FFN Intermediate Size (per Expert)
1,536
Dense Layers Before MoE
1
ERNIE-4.5-21B-A3B is a high-efficiency large language model belonging to Baidu's ERNIE 4.5 family, specifically engineered for advanced text understanding and complex reasoning tasks. As a Mixture-of-Experts (MoE) model, it maintains a massive 21 billion total parameter count while activating only 3 billion parameters per token. This architectural strategy allows the model to achieve performance levels typical of larger systems while maintaining a computational footprint suitable for agile deployment. The model is part of a broader multimodal lineage but this specific variant is post-trained to excel in natural language processing, logical deduction, and structured tool usage.
The technical backbone of ERNIE-4.5-21B-A3B utilizes a fine-grained heterogeneous MoE structure designed to mitigate cross-modal interference during initial pre-training. It employs 64 experts per layer, with a routing mechanism that selects 6 active experts per token alongside 2 shared experts that facilitate global knowledge integration. The architecture incorporates Grouped-Query Attention (GQA) for optimized memory throughput and employs Rotary Position Embeddings (RoPE) with a progressive frequency scaling method. This scaling allows the model to natively support a 131,072-token context window, making it effective for processing long-form documentation and multi-step reasoning chains without the degradation often seen in context-extended models.
Optimized for production-grade environments, the model supports advanced quantization techniques including 4-bit and 2-bit convolutional code quantization, which minimizes memory requirements for inference. The training infrastructure leverages FP8 mixed-precision and hierarchical load balancing to ensure expert stability and high throughput. Designed to be interoperable across deep learning ecosystems, ERNIE-4.5-21B-A3B is compatible with the PaddlePaddle framework and provides PyTorch-formatted weights for integration into standard Transformers-based pipelines. Its capabilities are further extended by its native support for function calling and structured data interaction, making it a viable foundation for agentic workflows and automated technical tasks.
The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.
Rank
#158
| Benchmark | Score | Rank |
|---|---|---|
General Knowledge MMLU | 0.419 | 36 |
Overall Rank
#158
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online