Llama 3.1 8B: Specifications and GPU VRAM Requirements

Llama 3.1 8B

开源

开放权重

参数

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Llama 3.1 Community License

发布日期

23 Jul 2024

训练数据截止日期

Dec 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

4096

层数

注意力头

键值头

激活函数

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Llama 3.1 8B

The Llama 3.1 8B model is a component of the Meta Llama 3.1 series, a collection of large language models developed by Meta. This model variant, featuring 8 billion parameters, is engineered to serve a range of natural language understanding and generation tasks. Its design prioritizes efficiency and responsiveness, making it suitable for deployment in environments with computational constraints. The model is optimized for dialogue applications and is designed to adhere to complex instructions, supporting its utility in conversational agents and virtual assistant systems.

Architecturally, Llama 3.1 8B is built upon an optimized transformer framework, employing a dense network configuration. A notable innovation is the integration of Grouped-Query Attention (GQA), which enhances inference scalability. The internal mechanics of the model incorporate the SiLU (Swish) activation function and RMSNorm for effective normalization across its layers. Positional encodings are managed through Rotary Position Embedding (RoPE), and the architecture leverages Flash Attention to improve processing speed. The model's training involved a substantial dataset of approximately 15 trillion tokens from publicly available sources, augmented with supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align its outputs with desired helpfulness and safety criteria. A significant enhancement in this iteration is the expanded context length, which now extends to 128,000 tokens.

Regarding its capabilities and applications, the Llama 3.1 8B model is proficient in tasks such as text summarization, text classification, and sentiment analysis, particularly in scenarios demanding low-latency inference. Its multilingual support extends to eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, facilitating its application in diverse linguistic contexts. The model also supports advanced workflows, including long-form text summarization, and can be utilized in processes such as synthetic data generation and model distillation to refine smaller language models.

关于 Llama 3.1

Llama 3.1 is Meta's advanced large language model family, building upon Llama 3. It features an optimized decoder-only transformer architecture, available in 8B, 70B, and 405B parameter versions. Significant enhancements include an expanded 128K token context window and improved multilingual capabilities across eight languages, refined through data and post-training procedures.

其他 Llama 3.1 模型

评估基准

排名适用于本地LLM。

排名

#53

基准	分数	排名
Graduate-Level QA GPQA	0.54	11
Refactoring Aider Refactoring	0.38	15
StackEval ProLLM Stack Eval	0.5	15
Summarization ProLLM Summarization	0.49	17
Coding Aider Coding	0.38	18
Professional Knowledge MMLU Pro	0.48	23
Coding LiveBench Coding	0.11	32
Data Analysis LiveBench Data Analysis	0.33	32
Reasoning LiveBench Reasoning	0.15	33
Mathematics LiveBench Mathematics	0.15	34
General Knowledge MMLU	0.30	34

排名

#53

编程排名

#45

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Llama 3.1 8B

技术规格

系统要求

Llama 3.1 8B

关于 Llama 3.1

其他 Llama 3.1 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源