Qwen2-72B: Specifications and GPU VRAM Requirements

Qwen2-72B

闭源

开放权重

参数

72B

上下文长度

32.768K

模态

Text

架构

Dense

许可证

Tongyi Qianwen LICENSE AGREEMENT

发布日期

7 Jun 2024

训练数据截止日期

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

8192

层数

注意力头

128

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen2-72B

Qwen2-72B is a significant iteration within the Qwen2 large language model series, developed by Alibaba. This model is engineered to handle a diverse array of natural language processing tasks, encompassing both comprehension and generation, alongside proficiency in coding and mathematical problem-solving. It functions as a foundational model, intended for further specialized fine-tuning to address particular application domains.

The architectural foundation of Qwen2-72B is the Transformer, augmented with several advancements to enhance computational efficiency and model performance. Key innovations include the adoption of the SwiGLU activation function and the implementation of Group Query Attention (GQA), which optimizes the attention mechanism for reduced memory footprint and accelerated inference. Furthermore, the model incorporates an enhanced tokenizer, designed to process a wide spectrum of natural languages and programming code effectively. Notably, Qwen2-72B maintains a dense model architecture, distinguishing it from Mixture-of-Experts (MoE) configurations found in other variants within the broader Qwen2 family.

From a functional perspective, Qwen2-72B demonstrates capabilities across multiple critical areas. It is designed to excel in tasks requiring sophisticated natural language understanding, robust language generation, and adeptness in coding and mathematical reasoning. While positioned as a base model, it provides a strong pre-trained foundation suitable for post-training methodologies such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). This design philosophy supports its application in scenarios demanding extensive multilingual understanding, complex code manipulation, or advanced mathematical computation.

关于 Qwen2

The Alibaba Qwen2 model family comprises large language models built upon the Transformer architecture. It includes both dense and Mixture-of-Experts (MoE) variants, designed for diverse language tasks. Technical features include Grouped Query Attention and support for extended context lengths up to 131,072 tokens, optimizing memory footprint for inference.

其他 Qwen2 模型

评估基准

排名适用于本地LLM。

排名

#35

基准	分数	排名
Refactoring Aider Refactoring	0.56	9
Coding Aider Coding	0.56	13
Graduate-Level QA GPQA	0.42	20
Professional Knowledge MMLU Pro	0.64	29
General Knowledge MMLU	0.42	31

排名

#35

编程排名

#23

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

16k

32k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Qwen2-72B

技术规格

系统要求

Qwen2-72B

关于 Qwen2

其他 Qwen2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源