ApX 标志

趋近智

Qwen2-72B

参数

72B

上下文长度

32.768K

模态

Text

架构

Dense

许可证

Tongyi Qianwen LICENSE AGREEMENT

发布日期

7 Jun 2024

知识截止

-

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

8192

层数

80

注意力头

128

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen2-72B

Qwen2-72B is a significant iteration within the Qwen2 large language model series, developed by Alibaba. This model is engineered to handle a diverse array of natural language processing tasks, encompassing both comprehension and generation, alongside proficiency in coding and mathematical problem-solving. It functions as a foundational model, intended for further specialized fine-tuning to address particular application domains.

The architectural foundation of Qwen2-72B is the Transformer, augmented with several advancements to enhance computational efficiency and model performance. Key innovations include the adoption of the SwiGLU activation function and the implementation of Group Query Attention (GQA), which optimizes the attention mechanism for reduced memory footprint and accelerated inference. Furthermore, the model incorporates an enhanced tokenizer, designed to process a wide spectrum of natural languages and programming code effectively. Notably, Qwen2-72B maintains a dense model architecture, distinguishing it from Mixture-of-Experts (MoE) configurations found in other variants within the broader Qwen2 family.

From a functional perspective, Qwen2-72B demonstrates capabilities across multiple critical areas. It is designed to excel in tasks requiring sophisticated natural language understanding, robust language generation, and adeptness in coding and mathematical reasoning. While positioned as a base model, it provides a strong pre-trained foundation suitable for post-training methodologies such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). This design philosophy supports its application in scenarios demanding extensive multilingual understanding, complex code manipulation, or advanced mathematical computation.

关于 Qwen2

The Alibaba Qwen2 model family comprises large language models built upon the Transformer architecture. It includes both dense and Mixture-of-Experts (MoE) variants, designed for diverse language tasks. Technical features include Grouped Query Attention and support for extended context lengths up to 131,072 tokens, optimizing memory footprint for inference.


其他 Qwen2 模型

评估基准

排名适用于本地LLM。

排名

#30

基准分数排名

0.56

9

0.56

12

Professional Knowledge

MMLU Pro

0.64

17

Graduate-Level QA

GPQA

0.42

21

General Knowledge

MMLU

0.42

28

排名

排名

#30

编程排名

#21

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
16k
32k

所需显存:

推荐 GPU

Qwen2-72B: Specifications and GPU VRAM Requirements