ApX logo

Qwen2-72B

Parameters

72B

Context Length

32.768K

Modality

Text

Architecture

Dense

License

Tongyi Qianwen LICENSE AGREEMENT

Release Date

7 Jun 2024

Knowledge Cutoff

-

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

8192

Number of Layers

80

Attention Heads

128

Key-Value Heads

8

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Qwen2-72B

Qwen2-72B is a significant iteration within the Qwen2 large language model series, developed by Alibaba. This model is engineered to handle a diverse array of natural language processing tasks, encompassing both comprehension and generation, alongside proficiency in coding and mathematical problem-solving. It functions as a foundational model, intended for further specialized fine-tuning to address particular application domains.

The architectural foundation of Qwen2-72B is the Transformer, augmented with several advancements to enhance computational efficiency and model performance. Key innovations include the adoption of the SwiGLU activation function and the implementation of Group Query Attention (GQA), which optimizes the attention mechanism for reduced memory footprint and accelerated inference. Furthermore, the model incorporates an enhanced tokenizer, designed to process a wide spectrum of natural languages and programming code effectively. Notably, Qwen2-72B maintains a dense model architecture, distinguishing it from Mixture-of-Experts (MoE) configurations found in other variants within the broader Qwen2 family.

From a functional perspective, Qwen2-72B demonstrates capabilities across multiple critical areas. It is designed to excel in tasks requiring sophisticated natural language understanding, robust language generation, and adeptness in coding and mathematical reasoning. While positioned as a base model, it provides a strong pre-trained foundation suitable for post-training methodologies such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). This design philosophy supports its application in scenarios demanding extensive multilingual understanding, complex code manipulation, or advanced mathematical computation.

About Qwen2

The Alibaba Qwen2 model family comprises large language models built upon the Transformer architecture. It includes both dense and Mixture-of-Experts (MoE) variants, designed for diverse language tasks. Technical features include Grouped Query Attention and support for extended context lengths up to 131,072 tokens, optimizing memory footprint for inference.


Other Qwen2 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#30

BenchmarkScoreRank

0.56

9

0.56

12

Professional Knowledge

MMLU Pro

0.64

17

Graduate-Level QA

GPQA

0.42

21

General Knowledge

MMLU

0.42

28

Rankings

Overall Rank

#30

Coding Rank

#21

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
16k
32k

VRAM Required:

Recommended GPUs

Qwen2-72B: Specifications and GPU VRAM Requirements