Parameters
70B
Context Length
8.192K
Modality
Text
Architecture
Dense
License
Meta Llama 3 Community License
Release Date
18 Apr 2024
Knowledge Cutoff
Dec 2023
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
8192
Number of Layers
80
Attention Heads
64
Key-Value Heads
8
Activation Function
-
Normalization
-
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Meta Llama 3 70B is a 70-billion-parameter, decoder-only transformer language model developed by Meta. Released in April 2024, it is provided in both pre-trained and instruction-fine-tuned variants. The instruction-tuned model is specifically optimized for dialogue and assistant-style interactions, supporting a wide array of natural language understanding and generation tasks. These include conversational AI applications, creative content generation, code generation, text summarization, classification, and complex reasoning challenges. The model is made available for both commercial and research applications under the Meta Llama 3 Community License.
Architecturally, Llama 3 70B employs a standard decoder-only transformer design. A key innovation is its tokenizer, which features a vocabulary size of 128,000 tokens, contributing to enhanced language encoding efficiency and optimized inference. To further improve inference scalability and speed, the model integrates Grouped Query Attention (GQA). This attention mechanism is applied across both the 8B and 70B parameter versions of Llama 3. Initial training of the model was conducted on sequences up to 8,192 tokens. For the instruction-tuned variants, supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) were utilized to align model outputs with human preferences for helpfulness and safety.
The Llama 3 70B model is engineered for general-purpose applications, serving as a foundational technology that can be further adapted for domain-specific tasks. Its capabilities extend to powering advanced assistant functionalities, as demonstrated by its integration into Meta AI applications across various platforms. The model's design focuses on enabling developers to build diverse generative AI applications, from complex coding assistants to long-form text summarization tools, while offering control and flexibility in deployment environments, including on-premise, cloud, and local setups.
Meta's Llama 3 is a series of large language models utilizing a decoder-only transformer architecture. It incorporates a 128K token vocabulary and Grouped Query Attention for efficient processing. Models are trained on substantial public datasets, supporting various parameter scales and extended context lengths.
Ranking is for Local LLMs.
Rank
#29
Benchmark | Score | Rank |
---|---|---|
Refactoring Aider Refactoring | 0.49 | 10 |
Coding Aider Coding | 0.49 | 13 |
Overall Rank
#29
Coding Rank
#29
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens