ApX 标志

趋近智

Qwen2.5-32B

参数

32B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

19 Sept 2024

知识截止

Mar 2024

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

8192

层数

60

注意力头

96

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen2.5-32B

The Qwen2.5-32B model is a significant component of the Qwen2.5 series of large language models, developed by the Qwen team at Alibaba Cloud. This iteration builds upon its predecessors by offering enhanced capabilities for a broad spectrum of natural language processing tasks. Its design prioritizes robust instruction following, effective long-text generation, and sophisticated comprehension and production of structured data, including JSON formats. The model also demonstrates improved stability when confronted with diverse system prompts, which is advantageous for developing conversational agents and setting specific dialogue conditions. Furthermore, it provides comprehensive multilingual support across more than 29 languages, expanding its applicability in global contexts.

Architecturally, Qwen2.5-32B is a dense, decoder-only transformer model. It integrates several advanced components to optimize performance and efficiency. These include Rotary Position Embeddings (RoPE) for effective positional encoding, SwiGLU as the activation function for enhanced non-linearity, and RMSNorm for stable training and improved convergence. To optimize inference speed and Key-Value cache utilization, the model employs Grouped Query Attention (GQA). The underlying training regimen involved a massive dataset, expanded to approximately 18 trillion tokens, which contributed to its enriched knowledge base, particularly in domains such as coding, mathematics, and various languages.

The operational characteristics of Qwen2.5-32B demonstrate notable performance across various complex tasks. This model variant is adept at handling extended contexts, supporting sequences up to 131,072 tokens. Its ability to generate long texts, with outputs extending up to 8,192 tokens, makes it suitable for applications requiring detailed responses or extensive content creation. While the base model is general-purpose, the architectural foundations of Qwen2.5 have also been utilized in specialized variants, such as those optimized for coding or multimodal vision-language tasks, underscoring the versatility of the Qwen2.5 framework.

关于 Qwen2.5

Qwen2.5 by Alibaba is a family of dense, decoder-only language models available in various sizes, with some variants utilizing Mixture-of-Experts. These models are pretrained on large-scale datasets, supporting extended context lengths and multilingual communication. The family includes specialized models for coding, mathematics, and multimodal tasks, such as vision and audio processing.


其他 Qwen2.5 模型

评估基准

排名适用于本地LLM。

排名

#18

基准分数排名

0.73

🥇

1

0.73

🥉

3

0.95

4

0.9

6

0.74

7

Web Development

WebDev Arena

902.26

7

Professional Knowledge

MMLU Pro

0.69

11

Graduate-Level QA

GPQA

0.49

15

General Knowledge

MMLU

0.49

23

排名

排名

#18

编程排名

#11

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU