Parameters
405B
Context Length
128K
Modality
Text
Architecture
Dense
License
Llama 3.1 Community License Agreement
Release Date
23 Jul 2024
Knowledge Cutoff
Dec 2023
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
16384
Number of Layers
126
Attention Heads
128
Key-Value Heads
8
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Meta Llama 3.1 405B is the largest generative AI model within the Llama 3.1 collection, which also includes 8B and 70B parameter variants. This model is engineered to serve a broad spectrum of commercial and research applications, focusing on multilingual dialogue and advanced text generation. It is designed to expand the accessibility of sophisticated AI capabilities, supporting a comprehensive set of eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Architecturally, Llama 3.1 405B employs an optimized decoder-only Transformer. A notable innovation in its structure is the integration of Grouped-Query Attention (GQA), which is implemented to enhance inference scalability. The model was trained on an extensive dataset exceeding 15 trillion tokens, leveraging a substantial computational infrastructure of over 16,000 H100 GPUs. This training scale is distinct for a Llama model. Post-training refinement involves multiple iterative rounds of Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) to align the model's responses. The internal mechanisms feature Rotary Positional Embedding (RoPE) for positional encoding and Root Mean Square Normalization (RMSNorm) for internal state normalization. Its activation function is SwiGLU. The architecture prioritizes training stability and scalability by intentionally not incorporating a Mixture-of-Experts (MoE) design.
Functionally, Llama 3.1 405B offers a significantly expanded context length of 128,000 tokens, enabling processing of extended textual inputs. It demonstrates advanced capabilities in various domains, including general knowledge comprehension, steerability, mathematical problem-solving, and the use of external tools. Practical applications include long-form text summarization, development of multilingual conversational agents, and assistance with coding tasks. Additionally, the model is designed to facilitate advanced workflows such as generating synthetic data to enhance the training of smaller models and supporting model distillation processes. Its substantial parameter count contributes to its capacity for generating detailed and contextually relevant text.
Llama 3.1 is Meta's advanced large language model family, building upon Llama 3. It features an optimized decoder-only transformer architecture, available in 8B, 70B, and 405B parameter versions. Significant enhancements include an expanded 128K token context window and improved multilingual capabilities across eight languages, refined through data and post-training procedures.
Ranking is for Local LLMs.
Rank
#34
Benchmark | Score | Rank |
---|---|---|
Refactoring Aider Refactoring | 0.66 | 🥉 3 |
Coding Aider Coding | 0.66 | 6 |
Web Development WebDev Arena | 809.67 | 8 |
Professional Knowledge MMLU Pro | 0.73 | 8 |
StackEval ProLLM Stack Eval | 0.8 | 12 |
QA Assistant ProLLM QA Assistant | 0.88 | 12 |
Graduate-Level QA GPQA | 0.51 | 13 |
Data Analysis LiveBench Data Analysis | 0.56 | 14 |
Reasoning LiveBench Reasoning | 0.41 | 16 |
General Knowledge MMLU | 0.51 | 21 |
Mathematics LiveBench Mathematics | 0.40 | 24 |
Coding LiveBench Coding | 0.29 | 25 |
Overall Rank
#34
Coding Rank
#30
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens