Active Parameters
300B
Context Length
131.072K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
30 Jun 2025
Knowledge Cutoff
Jun 2025
Total Expert Parameters
47.0B
Number of Experts
64
Active Experts
8
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
-
Number of Layers
54
Attention Heads
64
Key-Value Heads
8
Activation Function
GELU
Normalization
Layer Normalization
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
The ERNIE 4.5 model family, developed by Baidu, represents a new generation of large-scale foundation models. This family includes ten distinct variants, designed to integrate and process diverse input modalities such as text, image, and video, while primarily generating text outputs. The ERNIE-4.5-300B-A47B-Base variant functions as a large language model within this family, optimized for advanced reasoning and high-quality text generation tasks. Its capabilities extend to comprehensive language understanding and generation, supporting a broad spectrum of applications.
Central to the ERNIE 4.5 architecture is a multimodal heterogeneous Mixture-of-Experts (MoE) structure. This design enables efficient parameter sharing across various modalities, including self-attention and expert parameters, while also incorporating dedicated parameters for distinct modalities such as text and vision. This architectural approach is engineered to enhance multimodal understanding without compromising performance on tasks strictly involving text. Key innovations within this framework include "FlashMask" Dynamic Attention Masking and a modality-isolated routing technique, which contribute to improved efficiency and performance. The models are trained using the PaddlePaddle deep learning framework, leveraging techniques such as intra-node expert parallelism, memory-efficient pipeline scheduling, FP8 mixed-precision training, and fine-grained recomputation methods to ensure optimal efficiency.
The ERNIE-4.5-300B-A47B-Base model supports long-context processing, accommodating sequence lengths up to 131,072 tokens. This enables it to handle extensive textual inputs for complex reasoning and generation tasks. Its Mixture-of-Experts architecture is tailored for efficient scaling and delivers high-throughput inference across various hardware configurations. This model is well-suited for general-purpose large language model applications that require robust reasoning capabilities and high processing speed. Developers can further adapt and fine-tune the model for specific application requirements using associated toolkits like ERNIEKit, which supports methodologies such as Supervised Fine-Tuning (SFT), Low-Rank Adaptation (LoRA), and Direct Preference Optimization (DPO).
The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.
Ranking is for Local LLMs.
No evaluation benchmarks for ERNIE-4.5-300B-A47B-Base available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens