ApX 标志

趋近智

Llama 3.3 70B

参数

70B

上下文长度

130K

模态

Text

架构

Dense

许可证

Llama 3.3 Community License

发布日期

7 Dec 2024

知识截止

Dec 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

8192

层数

80

注意力头

64

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Llama 3.3 70B

The Meta Llama 3.3 70B is a large language model engineered for text-based generative applications. It operates as a dense Transformer model, incorporating an optimized architectural design. This model variant is specifically instruction-tuned for dialogue, demonstrating proficiency in multilingual chat scenarios, code assistance, and synthetic data generation. Its development involved extensive pretraining on approximately 15 trillion tokens sourced from publicly available online datasets.

From an architectural perspective, Llama 3.3 70B integrates Grouped-Query Attention (GQA) to enhance inference scalability and efficiency. The model's training regimen includes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), which are applied to align its outputs with human preferences for helpfulness and safety. A notable feature is its extended context window, supporting up to 130,000 tokens, enabling the processing and generation of longer text sequences for advanced use cases such as long-form summarization and complex multi-turn conversations.

The model is equipped with capabilities for multilingual inputs and outputs, encompassing languages such as English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Furthermore, it supports tool-use, providing developers with the ability to extend its functionality via custom function definitions and integration with third-party services. This design emphasizes efficiency and aims to reduce hardware requirements, thereby increasing the accessibility of high-quality AI for various applications.

关于 Llama 3.3

Meta's Llama 3.3 is a 70 billion parameter, multilingual large language model. It utilizes an optimized transformer architecture, incorporating Grouped-Query Attention for enhanced inference efficiency. The model features an extended 128k token context window and is designed to support quantization, facilitating deployment on varied hardware configurations.


其他 Llama 3.3 模型
  • 没有相关模型

评估基准

排名适用于本地LLM。

排名

#27

基准分数排名

0.59

6

0.85

9

0.59

10

0.9

11

0.68

11

Professional Knowledge

MMLU Pro

0.69

12

Graduate-Level QA

GPQA

0.51

14

0.52

16

0.49

22

General Knowledge

MMLU

0.51

22

0.33

23

0.41

23

排名

排名

#27

编程排名

#18

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
63k
127k

所需显存:

推荐 GPU