Llama 3.3 70B: Specifications and GPU VRAM Requirements

Llama 3.3 70B

开源

开放权重

参数

70B

上下文长度

130K

模态

Text

架构

Dense

许可证

Llama 3.3 Community License

发布日期

7 Dec 2024

训练数据截止日期

Dec 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

8192

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Llama 3.3 70B

The Meta Llama 3.3 70B is a large language model engineered for text-based generative applications. It operates as a dense Transformer model, incorporating an optimized architectural design. This model variant is specifically instruction-tuned for dialogue, demonstrating proficiency in multilingual chat scenarios, code assistance, and synthetic data generation. Its development involved extensive pretraining on approximately 15 trillion tokens sourced from publicly available online datasets.

From an architectural perspective, Llama 3.3 70B integrates Grouped-Query Attention (GQA) to enhance inference scalability and efficiency. The model's training regimen includes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), which are applied to align its outputs with human preferences for helpfulness and safety. A notable feature is its extended context window, supporting up to 130,000 tokens, enabling the processing and generation of longer text sequences for advanced use cases such as long-form summarization and complex multi-turn conversations.

The model is equipped with capabilities for multilingual inputs and outputs, encompassing languages such as English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Furthermore, it supports tool-use, providing developers with the ability to extend its functionality via custom function definitions and integration with third-party services. This design emphasizes efficiency and aims to reduce hardware requirements, thereby increasing the accessibility of high-quality AI for various applications.

关于 Llama 3.3

Meta's Llama 3.3 is a 70 billion parameter, multilingual large language model. It utilizes an optimized transformer architecture, incorporating Grouped-Query Attention for enhanced inference efficiency. The model features an extended 128k token context window and is designed to support quantization, facilitating deployment on varied hardware configurations.

其他 Llama 3.3 模型

没有相关模型

评估基准

排名适用于本地LLM。

排名

#37

基准	分数	排名
Refactoring Aider Refactoring	0.59	6
StackEval ProLLM Stack Eval	0.85	9
Coding Aider Coding	0.59	11
QA Assistant ProLLM QA Assistant	0.9	12
Graduate-Level QA GPQA	0.51	13
Summarization ProLLM Summarization	0.68	14
Coding LiveBench Coding	0.52	17
Professional Knowledge MMLU Pro	0.69	23
Data Analysis LiveBench Data Analysis	0.49	24
General Knowledge MMLU	0.51	24
Reasoning LiveBench Reasoning	0.33	25
Mathematics LiveBench Mathematics	0.41	25

排名

#37

编程排名

#19

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

63k

127k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Llama 3.3 70B

技术规格

系统要求

Llama 3.3 70B

关于 Llama 3.3

其他 Llama 3.3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源