Llama 3.1 405B: Specifications and GPU VRAM Requirements

Llama 3.1 405B

开源

开放权重

参数

405B

上下文长度

128K

模态

Text

架构

Dense

许可证

Llama 3.1 Community License Agreement

发布日期

23 Jul 2024

知识截止

Dec 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

16384

层数

126

注意力头

128

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Llama 3.1 405B

Meta Llama 3.1 405B is the largest generative AI model within the Llama 3.1 collection, which also includes 8B and 70B parameter variants. This model is engineered to serve a broad spectrum of commercial and research applications, focusing on multilingual dialogue and advanced text generation. It is designed to expand the accessibility of sophisticated AI capabilities, supporting a comprehensive set of eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Architecturally, Llama 3.1 405B employs an optimized decoder-only Transformer. A notable innovation in its structure is the integration of Grouped-Query Attention (GQA), which is implemented to enhance inference scalability. The model was trained on an extensive dataset exceeding 15 trillion tokens, leveraging a substantial computational infrastructure of over 16,000 H100 GPUs. This training scale is distinct for a Llama model. Post-training refinement involves multiple iterative rounds of Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) to align the model's responses. The internal mechanisms feature Rotary Positional Embedding (RoPE) for positional encoding and Root Mean Square Normalization (RMSNorm) for internal state normalization. Its activation function is SwiGLU. The architecture prioritizes training stability and scalability by intentionally not incorporating a Mixture-of-Experts (MoE) design.

Functionally, Llama 3.1 405B offers a significantly expanded context length of 128,000 tokens, enabling processing of extended textual inputs. It demonstrates advanced capabilities in various domains, including general knowledge comprehension, steerability, mathematical problem-solving, and the use of external tools. Practical applications include long-form text summarization, development of multilingual conversational agents, and assistance with coding tasks. Additionally, the model is designed to facilitate advanced workflows such as generating synthetic data to enhance the training of smaller models and supporting model distillation processes. Its substantial parameter count contributes to its capacity for generating detailed and contextually relevant text.

关于 Llama 3.1

Llama 3.1 is Meta's advanced large language model family, building upon Llama 3. It features an optimized decoder-only transformer architecture, available in 8B, 70B, and 405B parameter versions. Significant enhancements include an expanded 128K token context window and improved multilingual capabilities across eight languages, refined through data and post-training procedures.

其他 Llama 3.1 模型

评估基准

排名适用于本地LLM。

排名

#34

基准	分数	排名
Refactoring Aider Refactoring	0.66	🥉 3
Coding Aider Coding	0.66	6
Web Development WebDev Arena	809.67	8
Professional Knowledge MMLU Pro	0.73	8
StackEval ProLLM Stack Eval	0.8	12
QA Assistant ProLLM QA Assistant	0.88	12
Graduate-Level QA GPQA	0.51	13
Data Analysis LiveBench Data Analysis	0.56	14
Reasoning LiveBench Reasoning	0.41	16
General Knowledge MMLU	0.51	21
Mathematics LiveBench Mathematics	0.40	24
Coding LiveBench Coding	0.29	25

排名

#34

编程排名

#30

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

63k

125k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Llama 3.1 405B

技术规格

系统要求

Llama 3.1 405B

关于 Llama 3.1

其他 Llama 3.1 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源