ApX logoApX logo

ERNIE-4.5-21B-A3B

Active Parameters

21B

Context Length

131K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

30 Jun 2025

Knowledge Cutoff

Dec 2024

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

20

Key-Value Heads

4

Attention Head Dimension

-

Position Embedding

Absolute Position Embedding

RoPE Theta

500,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

Swish

Dimensions

Hidden Dimension Size

2,560

Number of Layers

28

FFN Intermediate Size (Dense)

1,536

Multi-Token Prediction Heads

1

Tokenizer

Vocabulary Size

103,424

Mixture of Experts

Total Expert Parameters

3.0B

Number of Experts

64

Active Experts

6

Shared Experts

2

FFN Intermediate Size (per Expert)

1,536

Dense Layers Before MoE

1

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 2.6k · Context: 131K · Vocab: 103.4kx 28 layersRMSNormPre-AttentionGrouped-Query Attention20Q / 4KV headsHead dim: 128+RMSNormPre-FFNSparse MoE FFN (6/64 experts)SwishIntermediate: 1.5k+Final RMSNormOutput Logits

ERNIE-4.5-21B-A3B

ERNIE-4.5-21B-A3B is a high-efficiency large language model belonging to Baidu's ERNIE 4.5 family, specifically engineered for advanced text understanding and complex reasoning tasks. As a Mixture-of-Experts (MoE) model, it maintains a massive 21 billion total parameter count while activating only 3 billion parameters per token. This architectural strategy allows the model to achieve performance levels typical of larger systems while maintaining a computational footprint suitable for agile deployment. The model is part of a broader multimodal lineage but this specific variant is post-trained to excel in natural language processing, logical deduction, and structured tool usage.

The technical backbone of ERNIE-4.5-21B-A3B utilizes a fine-grained heterogeneous MoE structure designed to mitigate cross-modal interference during initial pre-training. It employs 64 experts per layer, with a routing mechanism that selects 6 active experts per token alongside 2 shared experts that facilitate global knowledge integration. The architecture incorporates Grouped-Query Attention (GQA) for optimized memory throughput and employs Rotary Position Embeddings (RoPE) with a progressive frequency scaling method. This scaling allows the model to natively support a 131,072-token context window, making it effective for processing long-form documentation and multi-step reasoning chains without the degradation often seen in context-extended models.

Optimized for production-grade environments, the model supports advanced quantization techniques including 4-bit and 2-bit convolutional code quantization, which minimizes memory requirements for inference. The training infrastructure leverages FP8 mixed-precision and hierarchical load balancing to ensure expert stability and high throughput. Designed to be interoperable across deep learning ecosystems, ERNIE-4.5-21B-A3B is compatible with the PaddlePaddle framework and provides PyTorch-formatted weights for integration into standard Transformers-based pipelines. Its capabilities are further extended by its native support for function calling and structured data interaction, making it a viable foundation for agentic workflows and automated technical tasks.

About ERNIE 4.5

The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.


Other ERNIE 4.5 Models

Evaluation Benchmarks

Rank

#158

BenchmarkScoreRank

General Knowledge

MMLU

0.419

36

Rankings

Overall Rank

#158

Coding Rank

-

Model Integrity

Total Score

B+

73 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
64k
128k

VRAM Required:

Recommended GPUs

ERNIE-4.5-21B-A3B: Specifications and GPU VRAM Requirements