Llama 3 70B: Specifications and GPU VRAM Requirements

Llama 3 70B

开源

开放权重

参数

70B

上下文长度

8.192K

模态

Text

架构

Dense

许可证

Meta Llama 3 Community License

发布日期

18 Apr 2024

训练数据截止日期

Dec 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

8192

层数

注意力头

键值头

激活函数

归一化

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Llama 3 70B

Meta Llama 3 70B is a 70-billion-parameter, decoder-only transformer language model developed by Meta. Released in April 2024, it is provided in both pre-trained and instruction-fine-tuned variants. The instruction-tuned model is specifically optimized for dialogue and assistant-style interactions, supporting a wide array of natural language understanding and generation tasks. These include conversational AI applications, creative content generation, code generation, text summarization, classification, and complex reasoning challenges. The model is made available for both commercial and research applications under the Meta Llama 3 Community License.

Architecturally, Llama 3 70B employs a standard decoder-only transformer design. A key innovation is its tokenizer, which features a vocabulary size of 128,000 tokens, contributing to enhanced language encoding efficiency and optimized inference. To further improve inference scalability and speed, the model integrates Grouped Query Attention (GQA). This attention mechanism is applied across both the 8B and 70B parameter versions of Llama 3. Initial training of the model was conducted on sequences up to 8,192 tokens. For the instruction-tuned variants, supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) were utilized to align model outputs with human preferences for helpfulness and safety.

The Llama 3 70B model is engineered for general-purpose applications, serving as a foundational technology that can be further adapted for domain-specific tasks. Its capabilities extend to powering advanced assistant functionalities, as demonstrated by its integration into Meta AI applications across various platforms. The model's design focuses on enabling developers to build diverse generative AI applications, from complex coding assistants to long-form text summarization tools, while offering control and flexibility in deployment environments, including on-premise, cloud, and local setups.

关于 Llama 3

Meta's Llama 3 is a series of large language models utilizing a decoder-only transformer architecture. It incorporates a 128K token vocabulary and Grouped Query Attention for efficient processing. Models are trained on substantial public datasets, supporting various parameter scales and extended context lengths.

其他 Llama 3 模型

Llama 3 8B

评估基准

排名适用于本地LLM。

排名

#33

基准	分数	排名
Refactoring Aider Refactoring	0.49	10
Coding Aider Coding	0.49	15

排名

#33

编程排名

#32

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

所需显存:

资源

官方文档发布说明下载权重源代码

Llama 3 70B

技术规格

系统要求

Llama 3 70B

关于 Llama 3

其他 Llama 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源