ApX logo

Llama 3.1 405B

Parameters

405B

Context Length

128K

Modality

Text

Architecture

Dense

License

Llama 3.1 Community License Agreement

Release Date

23 Jul 2024

Knowledge Cutoff

Dec 2023

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

16384

Number of Layers

126

Attention Heads

128

Key-Value Heads

8

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Llama 3.1 405B

Meta Llama 3.1 405B is the largest generative AI model within the Llama 3.1 collection, which also includes 8B and 70B parameter variants. This model is engineered to serve a broad spectrum of commercial and research applications, focusing on multilingual dialogue and advanced text generation. It is designed to expand the accessibility of sophisticated AI capabilities, supporting a comprehensive set of eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Architecturally, Llama 3.1 405B employs an optimized decoder-only Transformer. A notable innovation in its structure is the integration of Grouped-Query Attention (GQA), which is implemented to enhance inference scalability. The model was trained on an extensive dataset exceeding 15 trillion tokens, leveraging a substantial computational infrastructure of over 16,000 H100 GPUs. This training scale is distinct for a Llama model. Post-training refinement involves multiple iterative rounds of Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) to align the model's responses. The internal mechanisms feature Rotary Positional Embedding (RoPE) for positional encoding and Root Mean Square Normalization (RMSNorm) for internal state normalization. Its activation function is SwiGLU. The architecture prioritizes training stability and scalability by intentionally not incorporating a Mixture-of-Experts (MoE) design.

Functionally, Llama 3.1 405B offers a significantly expanded context length of 128,000 tokens, enabling processing of extended textual inputs. It demonstrates advanced capabilities in various domains, including general knowledge comprehension, steerability, mathematical problem-solving, and the use of external tools. Practical applications include long-form text summarization, development of multilingual conversational agents, and assistance with coding tasks. Additionally, the model is designed to facilitate advanced workflows such as generating synthetic data to enhance the training of smaller models and supporting model distillation processes. Its substantial parameter count contributes to its capacity for generating detailed and contextually relevant text.

About Llama 3.1

Llama 3.1 is Meta's advanced large language model family, building upon Llama 3. It features an optimized decoder-only transformer architecture, available in 8B, 70B, and 405B parameter versions. Significant enhancements include an expanded 128K token context window and improved multilingual capabilities across eight languages, refined through data and post-training procedures.


Other Llama 3.1 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#34

BenchmarkScoreRank

0.66

🥉

3

0.66

6

Web Development

WebDev Arena

809.67

8

Professional Knowledge

MMLU Pro

0.73

8

0.8

12

0.88

12

Graduate-Level QA

GPQA

0.51

13

0.56

14

0.41

16

General Knowledge

MMLU

0.51

21

0.40

24

0.29

25

Rankings

Overall Rank

#34

Coding Rank

#30

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs

Llama 3.1 405B: Specifications and GPU VRAM Requirements