Qwen2.5-72B: Specifications and GPU VRAM Requirements

Qwen2.5-72B

闭源

开放权重

参数

72B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Qwen License

发布日期

19 Sept 2024

训练数据截止日期

Jan 2025

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

12288

层数

注意力头

128

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen2.5-72B

Qwen2.5-72B is a core component of the Qwen2.5 series of large language models developed by Alibaba. This model is built upon a Transformer architecture and operates as a causal language model. Its design incorporates Rotary Position Embeddings (RoPE), SwiGLU as the activation function, and RMSNorm for normalization, complemented by an attention mechanism that includes QKV bias. These architectural choices provide a robust foundation for general-purpose language processing tasks.

The Qwen2.5-72B model features advancements compared to its predecessor, Qwen2. It exhibits enhanced capabilities in handling complex knowledge, excelling in areas such as coding and mathematics. The model also demonstrates improved instruction following, making it more adaptable to diverse user prompts and conditional scenarios. Its design focuses on practical applications requiring high fidelity in output generation.

This model is engineered for extensive text processing, supporting context lengths up to 131,072 tokens and generating outputs up to 8,192 tokens. It is proficient in generating long-form content, understanding structured data formats like tables, and producing structured outputs such as JSON. Additionally, Qwen2.5-72B provides multilingual support across more than 29 languages, making it suitable for a wide array of content generation, coding assistance, and advanced artificial intelligence applications like chatbots and virtual assistants.

关于 Qwen2.5

Qwen2.5 by Alibaba is a family of dense, decoder-only language models available in various sizes, with some variants utilizing Mixture-of-Experts. These models are pretrained on large-scale datasets, supporting extended context lengths and multilingual communication. The family includes specialized models for coding, mathematics, and multimodal tasks, such as vision and audio processing.

其他 Qwen2.5 模型

评估基准

排名适用于本地LLM。

排名

#29

基准	分数	排名
Refactoring Aider Refactoring	0.65	4
StackEval ProLLM Stack Eval	0.89	7
Coding Aider Coding	0.65	8
QA Assistant ProLLM QA Assistant	0.94	8
Summarization ProLLM Summarization	0.74	9
Coding LiveBench Coding	0.57	15
Graduate-Level QA GPQA	0.49	15
Agentic Coding LiveBench Agentic	0.03	16
Mathematics LiveBench Mathematics	0.52	20
Data Analysis LiveBench Data Analysis	0.52	21
Professional Knowledge MMLU Pro	0.71	22
Reasoning LiveBench Reasoning	0.34	23
General Knowledge MMLU	0.49	26

排名

#29

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Qwen2.5-72B

技术规格

系统要求

Qwen2.5-72B

关于 Qwen2.5

其他 Qwen2.5 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源