Llama 4 Behemoth: Specifications and GPU VRAM Requirements

Llama 4 Behemoth

闭源

开放权重

活跃参数

上下文长度

模态

Multimodal

架构

Mixture of Experts (MoE)

许可证

Llama 4 Community License Agreement

发布日期

训练数据截止日期

技术规格

专家参数总数

288.0B

专家数量

活跃专家

注意力结构

Grouped-Query Attention

隐藏维度大小

16384

层数

160

注意力头

128

键值头

激活函数

归一化

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Llama 4 Behemoth

Llama 4 Behemoth is an unreleased, large-scale multimodal model developed by Meta. Its primary function within the Llama 4 model family is to act as a teacher model, facilitating the distillation of advanced intelligence and knowledge into smaller, more deployable models such as Llama 4 Scout and Llama 4 Maverick. This strategic role aims to enhance the capabilities of these student models across various tasks. While Llama 4 Behemoth is Meta's largest and most powerful model, it is currently still in training and has not been released for public use, with reports indicating potential delays in its public debut. Its designation as a foundational teacher model implies its use in internal research and development to advance the boundaries of AI performance.

The architectural design of Llama 4 Behemoth is based on a Mixture-of-Experts (MoE) configuration. This architecture incorporates approximately 2 trillion total parameters, with 288 billion active parameters engaged during inference. The model integrates 16 distinct expert networks. Llama 4 Behemoth is natively multimodal, capable of processing and understanding text, images, and video data through an early fusion mechanism. Training for Llama 4 Behemoth involved significant computational resources, including 32,000 GPUs, utilizing FP8 precision to optimize efficiency while processing over 30 trillion tokens of diverse data. This architecture enables efficient scaling and advanced performance characteristics, leveraging a novel distillation loss function to dynamically balance soft and hard targets during the knowledge transfer process to student models.

While Llama 4 Behemoth is not yet publicly available, internal evaluations indicate its performance. It has demonstrated capabilities that include outperforming various models on STEM-focused benchmarks, such as those related to mathematical problem-solving, multilingual understanding, and image reasoning. The model's primary use cases within Meta are for advanced AI research and for generating high-quality synthetic data, which is then used for training smaller, deployable models like Llama 4 Maverick. The application of MoE architecture in Llama 4 models contributes to computational efficiency by activating only a subset of parameters for each token during inference, which reduces compute costs while maintaining performance.

关于 Llama 4

Meta's Llama 4 model family implements a Mixture-of-Experts (MoE) architecture for efficient scaling. It features native multimodality through early fusion of text, images, and video. This iteration also supports significantly extended context lengths, with models capable of processing up to 10 million tokens.

其他 Llama 4 模型

评估基准

排名适用于本地LLM。

没有可用的 Llama 4 Behemoth 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档

Llama 4 Behemoth

技术规格

系统要求

Llama 4 Behemoth

关于 Llama 4

其他 Llama 4 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源