ApX logoApX logo

ERNIE-4.5-21B-A3B

Active Parameters

21B

Context Length

131.072K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

30 Jun 2025

Knowledge Cutoff

Dec 2024

Technical Specifications

Total Expert Parameters

3.0B

Number of Experts

64

Active Experts

6

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

-

Number of Layers

28

Attention Heads

20

Key-Value Heads

4

Activation Function

-

Normalization

-

Position Embedding

Absolute Position Embedding

ERNIE-4.5-21B-A3B

ERNIE-4.5-21B-A3B is a high-efficiency large language model belonging to Baidu's ERNIE 4.5 family, specifically engineered for advanced text understanding and complex reasoning tasks. As a Mixture-of-Experts (MoE) model, it maintains a massive 21 billion total parameter count while activating only 3 billion parameters per token. This architectural strategy allows the model to achieve performance levels typical of larger systems while maintaining a computational footprint suitable for agile deployment. The model is part of a broader multimodal lineage but this specific variant is post-trained to excel in natural language processing, logical deduction, and structured tool usage.

The technical backbone of ERNIE-4.5-21B-A3B utilizes a fine-grained heterogeneous MoE structure designed to mitigate cross-modal interference during initial pre-training. It employs 64 experts per layer, with a routing mechanism that selects 6 active experts per token alongside 2 shared experts that facilitate global knowledge integration. The architecture incorporates Grouped-Query Attention (GQA) for optimized memory throughput and employs Rotary Position Embeddings (RoPE) with a progressive frequency scaling method. This scaling allows the model to natively support a 131,072-token context window, making it effective for processing long-form documentation and multi-step reasoning chains without the degradation often seen in context-extended models.

Optimized for production-grade environments, the model supports advanced quantization techniques including 4-bit and 2-bit convolutional code quantization, which minimizes memory requirements for inference. The training infrastructure leverages FP8 mixed-precision and hierarchical load balancing to ensure expert stability and high throughput. Designed to be interoperable across deep learning ecosystems, ERNIE-4.5-21B-A3B is compatible with the PaddlePaddle framework and provides PyTorch-formatted weights for integration into standard Transformers-based pipelines. Its capabilities are further extended by its native support for function calling and structured data interaction, making it a viable foundation for agentic workflows and automated technical tasks.

About ERNIE 4.5

The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.


Other ERNIE 4.5 Models

Evaluation Benchmarks

No evaluation benchmarks for ERNIE-4.5-21B-A3B available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
64k
128k

VRAM Required:

Recommended GPUs

ERNIE-4.5-21B-A3B: Specifications and GPU VRAM Requirements