Qwen2-72B

Closed Source

Open Weights

Parameters

72B

Context Length

33K

Modality

Text

Architecture

Dense

License

Tongyi Qianwen LICENSE AGREEMENT

Release Date

7 Jun 2024

Knowledge Cutoff

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

153.05 GB VRAM

Consumer

8x RTX 4090

24GB VRAM

Datacenter

3x NVIDIA A100

80GB VRAM

Apple Silicon

2x Apple M3 Max

128GB VRAM

32,768 tokens

163.97 GB VRAM

Consumer

8x RTX 4090

24GB VRAM

Datacenter

3x NVIDIA A100

80GB VRAM

Apple Silicon

2x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

Rank

#102

Benchmark	Score	Rank
General Knowledge MMLU	0.823	19
Web Development WebDev Arena	1261	87
General Text Text Arena	1261	92

Rankings

Overall Rank

#102

Coding Rank

#101

About Qwen2-72B

Qwen2-72B is a significant iteration within the Qwen2 large language model series, developed by Alibaba. This model is engineered to handle a diverse array of natural language processing tasks, encompassing both comprehension and generation, alongside proficiency in coding and mathematical problem-solving. It functions as a foundational model, intended for further specialized fine-tuning to address particular application domains.

The architectural foundation of Qwen2-72B is the Transformer, augmented with several advancements to enhance computational efficiency and model performance. Key innovations include the adoption of the SwiGLU activation function and the implementation of Group Query Attention (GQA), which optimizes the attention mechanism for reduced memory footprint and accelerated inference. Furthermore, the model incorporates an enhanced tokenizer, designed to process a wide spectrum of natural languages and programming code effectively. Notably, Qwen2-72B maintains a dense model architecture, distinguishing it from Mixture-of-Experts (MoE) configurations found in other variants within the broader Qwen2 family.

From a functional perspective, Qwen2-72B demonstrates capabilities across multiple critical areas. It is designed to excel in tasks requiring sophisticated natural language understanding, robust language generation, and adeptness in coding and mathematical reasoning. While positioned as a base model, it provides a strong pre-trained foundation suitable for post-training methodologies such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). This design philosophy supports its application in scenarios demanding extensive multilingual understanding, complex code manipulation, or advanced mathematical computation.

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

128

Key-Value Heads

Attention Head Dimension

Position Embedding

ROPE

RoPE Theta

1,000,000

Sliding Window Attention

Sliding Window Size

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

8,192

Number of Layers

FFN Intermediate Size (Dense)

29,568

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

152,064

Model Integrity

Total Score

B-

63 / 100

Upstream

20.5 / 30

Model

24.5 / 40

Downstream

18.0 / 30

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code

About Qwen2

The Alibaba Qwen2 model family comprises large language models built upon the Transformer architecture. It includes both dense and Mixture-of-Experts (MoE) variants, designed for diverse language tasks. Technical features include Grouped Query Attention and support for extended context lengths up to 131,072 tokens, optimizing memory footprint for inference.