ERNIE-4.5-300B-A47B: Specifications and GPU VRAM Requirements

ERNIE-4.5-300B-A47B

Open Source

Open Weights

Active Parameters

300B

Context Length

131.072K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

30 Jun 2025

Knowledge Cutoff

Technical Specifications

Total Expert Parameters

47.0B

Number of Experts

Active Experts

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

ERNIE-4.5-300B-A47B

ERNIE-4.5-300B-A47B is a foundational language model within Baidu's ERNIE 4.5 family, designed to support advanced natural language processing tasks. While the broader ERNIE 4.5 series encompasses multimodal capabilities, this specific variant focuses on text-only applications, optimizing its architecture for efficient and robust language understanding and generation. Its primary purpose is to serve as a high-performance solution for general-purpose textual analysis and creation, including complex reasoning and knowledge-intensive tasks. The model supports text generation in both English and Chinese.

The model's technical foundation is a Mixture-of-Experts (MoE) architecture, featuring a total of 300 billion parameters with 47 billion parameters actively engaged per token during inference. The overarching ERNIE 4.5 MoE design includes a novel heterogeneous structure that facilitates parameter sharing while also allowing for dedicated parameters across different modalities, optimizing for multimodal understanding without compromising text-related performance. Key architectural enhancements include concepts like Dynamic Attention Masking (FlashMask), which contributes to efficient information processing, and modality-isolated routing. The model is trained using Baidu's PaddlePaddle deep learning framework, employing advanced techniques such as intra-node expert parallelism, memory-efficient pipeline scheduling, and FP8 mixed-precision training to achieve high throughput during pre-training.

For deployment and operational efficiency, ERNIE-4.5-300B-A47B supports highly efficient inference through methods like multi-expert parallel collaboration and convolutional code quantization, enabling near-lossless 4-bit and 2-bit quantization for diverse hardware configurations. It maintains a substantial context length of 131,072 tokens, allowing for the processing of extensive textual inputs and enabling coherent, long-form content generation. The model is also designed to be fine-tuned and deployed with developer toolkits like ERNIEKit and FastDeploy, making it accessible for a range of commercial and research applications under the Apache 2.0 license.

About ERNIE 4.5

The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.

Other ERNIE 4.5 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for ERNIE-4.5-300B-A47B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

64k

128k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code