Parameters
8B
Context Length
131.072K
Modality
Text
Architecture
Dense
License
Llama 3.1 Community License
Release Date
23 Jul 2024
Knowledge Cutoff
Dec 2023
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
4096
Number of Layers
32
Attention Heads
32
Key-Value Heads
8
Activation Function
-
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
The Llama 3.1 8B model is a component of the Meta Llama 3.1 series, a collection of large language models developed by Meta. This model variant, featuring 8 billion parameters, is engineered to serve a range of natural language understanding and generation tasks. Its design prioritizes efficiency and responsiveness, making it suitable for deployment in environments with computational constraints. The model is optimized for dialogue applications and is designed to adhere to complex instructions, supporting its utility in conversational agents and virtual assistant systems.
Architecturally, Llama 3.1 8B is built upon an optimized transformer framework, employing a dense network configuration. A notable innovation is the integration of Grouped-Query Attention (GQA), which enhances inference scalability. The internal mechanics of the model incorporate the SiLU (Swish) activation function and RMSNorm for effective normalization across its layers. Positional encodings are managed through Rotary Position Embedding (RoPE), and the architecture leverages Flash Attention to improve processing speed. The model's training involved a substantial dataset of approximately 15 trillion tokens from publicly available sources, augmented with supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align its outputs with desired helpfulness and safety criteria. A significant enhancement in this iteration is the expanded context length, which now extends to 128,000 tokens.
Regarding its capabilities and applications, the Llama 3.1 8B model is proficient in tasks such as text summarization, text classification, and sentiment analysis, particularly in scenarios demanding low-latency inference. Its multilingual support extends to eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, facilitating its application in diverse linguistic contexts. The model also supports advanced workflows, including long-form text summarization, and can be utilized in processes such as synthetic data generation and model distillation to refine smaller language models.
Llama 3.1 is Meta's advanced large language model family, building upon Llama 3. It features an optimized decoder-only transformer architecture, available in 8B, 70B, and 405B parameter versions. Significant enhancements include an expanded 128K token context window and improved multilingual capabilities across eight languages, refined through data and post-training procedures.
Ranking is for Local LLMs.
Rank
#53
Benchmark | Score | Rank |
---|---|---|
Graduate-Level QA GPQA | 0.54 | 11 |
Refactoring Aider Refactoring | 0.38 | 15 |
StackEval ProLLM Stack Eval | 0.5 | 15 |
Summarization ProLLM Summarization | 0.49 | 17 |
Coding Aider Coding | 0.38 | 18 |
Professional Knowledge MMLU Pro | 0.48 | 23 |
Data Analysis LiveBench Data Analysis | 0.33 | 30 |
Coding LiveBench Coding | 0.11 | 31 |
Reasoning LiveBench Reasoning | 0.15 | 31 |
Mathematics LiveBench Mathematics | 0.15 | 32 |
General Knowledge MMLU | 0.30 | 34 |
Overall Rank
#53
Coding Rank
#45
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens