ApX 标志

趋近智

Llama 4 Scout

活跃参数

109B

上下文长度

10,000K

模态

Multimodal

架构

Mixture of Experts (MoE)

许可证

Llama 4 Community License Agreement

发布日期

6 Apr 2025

知识截止

Aug 2024

技术规格

专家参数总数

-

专家数量

16

活跃专家

2

注意力结构

Grouped-Query Attention

隐藏维度大小

8192

层数

80

注意力头

64

键值头

8

激活函数

-

归一化

-

位置嵌入

Irope

系统要求

不同量化方法和上下文大小的显存要求

Llama 4 Scout

Llama 4 Scout is a key offering within Meta's Llama 4 family of models, released on April 5, 2025. It is designed to provide robust artificial intelligence capabilities for researchers and organizations while operating within practical hardware constraints. As a general-purpose model, Llama 4 Scout exhibits native multimodality, proficiently processing both text and image inputs. Its applications encompass a wide array of tasks, including complex conversational interactions, detailed image analysis, and advanced code generation. The model's design focuses on enabling efficient execution of these tasks across diverse computational environments.

Architecturally, Llama 4 Scout employs a Mixture-of-Experts (MoE) configuration, incorporating 109 billion total parameters, with 17 billion active parameters engaged per token across 16 experts. A significant innovation in its design is an industry-leading context window, supporting up to 10 million tokens, which represents a substantial increase over prior iterations. The model integrates an early fusion approach for its native multimodality, which unifies text and vision tokens within its foundational structure. Optimized for efficient deployment, Llama 4 Scout can run on a single NVIDIA H100 GPU when leveraging Int4 quantization. Furthermore, its architecture incorporates interleaved attention layers, specifically iRoPE, to enhance generalization capabilities across extended sequences.

Llama 4 Scout is well-suited for applications demanding the processing and analysis of extensive information volumes. Its primary use cases include multi-document summarization, detailed analysis of user activity for personalization, and reasoning over substantial codebases. The model demonstrates strong performance in tasks requiring document question-answering, precise information retrieval, and reliable source attribution, making it particularly valuable for professional document analysis. Its design for efficiency on a single GPU facilitates accessibility for organizations with varying computing infrastructure. The model also supports multilingual tasks, having been trained on data from 200 languages, with fine-tuning capabilities for 12 specific languages.

关于 Llama 4

Meta's Llama 4 model family implements a Mixture-of-Experts (MoE) architecture for efficient scaling. It features native multimodality through early fusion of text, images, and video. This iteration also supports significantly extended context lengths, with models capable of processing up to 10 million tokens.


其他 Llama 4 模型

评估基准

排名适用于本地LLM。

排名

#35

基准分数排名

Professional Knowledge

MMLU Pro

0.74

7

0.85

9

Graduate-Level QA

GPQA

0.57

9

0.16

11

0.68

11

0.87

13

General Knowledge

MMLU

0.57

17

排名

排名

#35

编程排名

#36

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
4883k
9766k

所需显存:

推荐 GPU

Llama 4 Scout: Specifications and GPU VRAM Requirements