趋近智
参数
405B
上下文长度
128K
模态
Text
架构
Dense
许可证
Llama 3.1 Community License Agreement
发布日期
23 Jul 2024
知识截止
Dec 2023
注意力结构
Grouped-Query Attention
隐藏维度大小
16384
层数
126
注意力头
128
键值头
8
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Meta Llama 3.1 405B is the largest generative AI model within the Llama 3.1 collection, which also includes 8B and 70B parameter variants. This model is engineered to serve a broad spectrum of commercial and research applications, focusing on multilingual dialogue and advanced text generation. It is designed to expand the accessibility of sophisticated AI capabilities, supporting a comprehensive set of eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Architecturally, Llama 3.1 405B employs an optimized decoder-only Transformer. A notable innovation in its structure is the integration of Grouped-Query Attention (GQA), which is implemented to enhance inference scalability. The model was trained on an extensive dataset exceeding 15 trillion tokens, leveraging a substantial computational infrastructure of over 16,000 H100 GPUs. This training scale is distinct for a Llama model. Post-training refinement involves multiple iterative rounds of Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) to align the model's responses. The internal mechanisms feature Rotary Positional Embedding (RoPE) for positional encoding and Root Mean Square Normalization (RMSNorm) for internal state normalization. Its activation function is SwiGLU. The architecture prioritizes training stability and scalability by intentionally not incorporating a Mixture-of-Experts (MoE) design.
Functionally, Llama 3.1 405B offers a significantly expanded context length of 128,000 tokens, enabling processing of extended textual inputs. It demonstrates advanced capabilities in various domains, including general knowledge comprehension, steerability, mathematical problem-solving, and the use of external tools. Practical applications include long-form text summarization, development of multilingual conversational agents, and assistance with coding tasks. Additionally, the model is designed to facilitate advanced workflows such as generating synthetic data to enhance the training of smaller models and supporting model distillation processes. Its substantial parameter count contributes to its capacity for generating detailed and contextually relevant text.
Llama 3.1 is Meta's advanced large language model family, building upon Llama 3. It features an optimized decoder-only transformer architecture, available in 8B, 70B, and 405B parameter versions. Significant enhancements include an expanded 128K token context window and improved multilingual capabilities across eight languages, refined through data and post-training procedures.
排名适用于本地LLM。
排名
#34
基准 | 分数 | 排名 |
---|---|---|
Refactoring Aider Refactoring | 0.66 | 🥉 3 |
Coding Aider Coding | 0.66 | 6 |
Web Development WebDev Arena | 809.67 | 8 |
Professional Knowledge MMLU Pro | 0.73 | 8 |
StackEval ProLLM Stack Eval | 0.8 | 12 |
QA Assistant ProLLM QA Assistant | 0.88 | 12 |
Graduate-Level QA GPQA | 0.51 | 13 |
Data Analysis LiveBench Data Analysis | 0.56 | 14 |
Reasoning LiveBench Reasoning | 0.41 | 16 |
General Knowledge MMLU | 0.51 | 21 |
Mathematics LiveBench Mathematics | 0.40 | 24 |
Coding LiveBench Coding | 0.29 | 25 |