ERNIE-4.5-300B-A47B-Base: Specifications and GPU VRAM Requirements

ERNIE-4.5-300B-A47B-Base

开源

开放权重

活跃参数

300B

上下文长度

131.072K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

30 Jun 2025

训练数据截止日期

Jun 2025

技术规格

专家参数总数

47.0B

专家数量

活跃专家

注意力结构

Grouped-Query Attention

隐藏维度大小

层数

注意力头

键值头

激活函数

GELU

归一化

Layer Normalization

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

ERNIE-4.5-300B-A47B-Base

The ERNIE 4.5 model family, developed by Baidu, represents a new generation of large-scale foundation models. This family includes ten distinct variants, designed to integrate and process diverse input modalities such as text, image, and video, while primarily generating text outputs. The ERNIE-4.5-300B-A47B-Base variant functions as a large language model within this family, optimized for advanced reasoning and high-quality text generation tasks. Its capabilities extend to comprehensive language understanding and generation, supporting a broad spectrum of applications.

Central to the ERNIE 4.5 architecture is a multimodal heterogeneous Mixture-of-Experts (MoE) structure. This design enables efficient parameter sharing across various modalities, including self-attention and expert parameters, while also incorporating dedicated parameters for distinct modalities such as text and vision. This architectural approach is engineered to enhance multimodal understanding without compromising performance on tasks strictly involving text. Key innovations within this framework include "FlashMask" Dynamic Attention Masking and a modality-isolated routing technique, which contribute to improved efficiency and performance. The models are trained using the PaddlePaddle deep learning framework, leveraging techniques such as intra-node expert parallelism, memory-efficient pipeline scheduling, FP8 mixed-precision training, and fine-grained recomputation methods to ensure optimal efficiency.

The ERNIE-4.5-300B-A47B-Base model supports long-context processing, accommodating sequence lengths up to 131,072 tokens. This enables it to handle extensive textual inputs for complex reasoning and generation tasks. Its Mixture-of-Experts architecture is tailored for efficient scaling and delivers high-throughput inference across various hardware configurations. This model is well-suited for general-purpose large language model applications that require robust reasoning capabilities and high processing speed. Developers can further adapt and fine-tune the model for specific application requirements using associated toolkits like ERNIEKit, which supports methodologies such as Supervised Fine-Tuning (SFT), Low-Rank Adaptation (LoRA), and Direct Preference Optimization (DPO).

关于 ERNIE 4.5

The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.

其他 ERNIE 4.5 模型

评估基准

排名适用于本地LLM。

没有可用的 ERNIE-4.5-300B-A47B-Base 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

ERNIE-4.5-300B-A47B-Base

技术规格

系统要求

ERNIE-4.5-300B-A47B-Base

关于 ERNIE 4.5

其他 ERNIE 4.5 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源