Qwen2.5-72B: Specifications and GPU VRAM Requirements

Qwen2.5-72B

Closed Source

Open Weights

Parameters

72B

Context Length

131.072K

Modality

Text

Architecture

Dense

License

Qwen License

Release Date

19 Sept 2024

Knowledge Cutoff

Jan 2025

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

12288

Number of Layers

Attention Heads

128

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Qwen2.5-72B

Qwen2.5-72B is a core component of the Qwen2.5 series of large language models developed by Alibaba. This model is built upon a Transformer architecture and operates as a causal language model. Its design incorporates Rotary Position Embeddings (RoPE), SwiGLU as the activation function, and RMSNorm for normalization, complemented by an attention mechanism that includes QKV bias. These architectural choices provide a robust foundation for general-purpose language processing tasks.

The Qwen2.5-72B model features advancements compared to its predecessor, Qwen2. It exhibits enhanced capabilities in handling complex knowledge, excelling in areas such as coding and mathematics. The model also demonstrates improved instruction following, making it more adaptable to diverse user prompts and conditional scenarios. Its design focuses on practical applications requiring high fidelity in output generation.

This model is engineered for extensive text processing, supporting context lengths up to 131,072 tokens and generating outputs up to 8,192 tokens. It is proficient in generating long-form content, understanding structured data formats like tables, and producing structured outputs such as JSON. Additionally, Qwen2.5-72B provides multilingual support across more than 29 languages, making it suitable for a wide array of content generation, coding assistance, and advanced artificial intelligence applications like chatbots and virtual assistants.

About Qwen2.5

Qwen2.5 by Alibaba is a family of dense, decoder-only language models available in various sizes, with some variants utilizing Mixture-of-Experts. These models are pretrained on large-scale datasets, supporting extended context lengths and multilingual communication. The family includes specialized models for coding, mathematics, and multimodal tasks, such as vision and audio processing.

Other Qwen2.5 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#22

Benchmark	Score	Rank
Refactoring Aider Refactoring	0.65	4
Coding Aider Coding	0.65	7
StackEval ProLLM Stack Eval	0.89	7
QA Assistant ProLLM QA Assistant	0.94	7
Summarization ProLLM Summarization	0.74	7
Professional Knowledge MMLU Pro	0.71	9
Coding LiveBench Coding	0.57	14
Graduate-Level QA GPQA	0.49	16
Agentic Coding LiveBench Agentic	0.03	17
Mathematics LiveBench Mathematics	0.52	18
Data Analysis LiveBench Data Analysis	0.52	19
Reasoning LiveBench Reasoning	0.34	21
General Knowledge MMLU	0.49	24

Rankings

Overall Rank

#22

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

64k

128k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code