趋近智
活跃参数
300B
上下文长度
131.072K
模态
Text
架构
Mixture of Experts (MoE)
许可证
Apache 2.0
发布日期
30 Jun 2025
知识截止
Jun 2025
专家参数总数
47.0B
专家数量
64
活跃专家
8
注意力结构
Grouped-Query Attention
隐藏维度大小
-
层数
54
注意力头
64
键值头
8
激活函数
GELU
归一化
Layer Normalization
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
The ERNIE 4.5 model family, developed by Baidu, represents a new generation of large-scale foundation models. This family includes ten distinct variants, designed to integrate and process diverse input modalities such as text, image, and video, while primarily generating text outputs. The ERNIE-4.5-300B-A47B-Base variant functions as a large language model within this family, optimized for advanced reasoning and high-quality text generation tasks. Its capabilities extend to comprehensive language understanding and generation, supporting a broad spectrum of applications.
Central to the ERNIE 4.5 architecture is a multimodal heterogeneous Mixture-of-Experts (MoE) structure. This design enables efficient parameter sharing across various modalities, including self-attention and expert parameters, while also incorporating dedicated parameters for distinct modalities such as text and vision. This architectural approach is engineered to enhance multimodal understanding without compromising performance on tasks strictly involving text. Key innovations within this framework include "FlashMask" Dynamic Attention Masking and a modality-isolated routing technique, which contribute to improved efficiency and performance. The models are trained using the PaddlePaddle deep learning framework, leveraging techniques such as intra-node expert parallelism, memory-efficient pipeline scheduling, FP8 mixed-precision training, and fine-grained recomputation methods to ensure optimal efficiency.
The ERNIE-4.5-300B-A47B-Base model supports long-context processing, accommodating sequence lengths up to 131,072 tokens. This enables it to handle extensive textual inputs for complex reasoning and generation tasks. Its Mixture-of-Experts architecture is tailored for efficient scaling and delivers high-throughput inference across various hardware configurations. This model is well-suited for general-purpose large language model applications that require robust reasoning capabilities and high processing speed. Developers can further adapt and fine-tune the model for specific application requirements using associated toolkits like ERNIEKit, which supports methodologies such as Supervised Fine-Tuning (SFT), Low-Rank Adaptation (LoRA), and Direct Preference Optimization (DPO).
The Baidu ERNIE 4.5 family consists of ten large-scale multimodal models. They utilize a heterogeneous Mixture-of-Experts (MoE) architecture, which enables parameter sharing across modalities while also employing dedicated parameters for specific modalities, supporting efficient language and multimodal processing.
排名适用于本地LLM。
没有可用的 ERNIE-4.5-300B-A47B-Base 评估基准。