ApX 标志ApX 标志

趋近智

ERNIE-4.5-21B-A3B-Base

活跃参数

21B

上下文长度

131.072K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

30 Jun 2025

训练数据截止日期

Dec 2024

技术规格

专家参数总数

3.0B

专家数量

64

活跃专家

6

注意力结构

Grouped-Query Attention

隐藏维度大小

2560

层数

28

注意力头

20

键值头

4

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

ERNIE-4.5-21B-A3B-Base

The ERNIE-4.5-21B-A3B-Base model is a text-focused Mixture-of-Experts (MoE) transformer and a core component of Baidu's ERNIE 4.5 model family. This specific variant is derived through a process of modality-specific extraction, where text-related parameters are isolated from a larger multimodal pre-training phase that incorporates trillions of tokens. Its architecture is characterized by a heterogeneous MoE structure that supports parameter sharing across modalities during training while maintaining dedicated experts for specific data types. This design ensures that textual representations are not compromised by multimodal joint training, allowing for high-performance natural language understanding and generation in both Chinese and English.

Technically, the model employs a sparse architecture featuring 64 experts per layer, with a routing mechanism that activates 6 experts per token, resulting in approximately 3 billion active parameters per forward pass. This sparsity provides a significant reduction in computational overhead while maintaining the representative capacity of a much larger 21-billion parameter model. The attention mechanism utilizes Grouped-Query Attention (GQA) with 20 query heads and 4 key-value heads, which optimizes memory bandwidth and inference speed. The integration of 2D Rotary Position Embeddings (RoPE) and support for a 131,072-token context window makes it highly effective for processing long-form documents and complex reasoning tasks.

To facilitate efficient deployment, the ERNIE 4.5 family is built on the PaddlePaddle framework and incorporates several hardware-level optimizations, including FP8 mixed-precision training and multi-expert parallel collaboration. The model supports advanced quantization techniques such as 4-bit and 2-bit lossless compression, enabling it to run on diverse hardware platforms with reduced memory requirements. By utilizing modality-isolated routing and specialized router losses, the model achieves high parameter efficiency, making it suitable for industrial-grade applications ranging from sophisticated summarization to cross-modal reasoning within a production environment.

关于 ERNIE 4.5

The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.


其他 ERNIE 4.5 模型

评估基准

没有可用的 ERNIE-4.5-21B-A3B-Base 评估基准。

排名

排名

-

编程排名

-

模型透明度

总分

B+

73 / 100

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU