ApX 标志

趋近智

Qwen2.5-7B

参数

7B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

19 Sept 2024

知识截止

-

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

4096

层数

32

注意力头

64

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen2.5-7B

Qwen2.5-7B is a foundational large language model developed by Alibaba Cloud, forming a part of the Qwen2.5 series. This model is a causal language model engineered for general-purpose applications, serving as a robust base for subsequent fine-tuning and specialized tasks. It is designed to extend the linguistic capabilities of its predecessors by incorporating an expanded knowledge base and enhancing performance in core language understanding and generation tasks. The model provides multilingual support, enabling processing across more than 29 languages. This versatility positions Qwen2.5-7B as a foundational component for diverse natural language processing systems.

Architecturally, Qwen2.5-7B employs a transformer-based encoder-decoder framework. Key architectural components include the integration of Rotary Position Embeddings (RoPE) for effective handling of sequence length and position, SwiGLU as its activation function for non-linearity, and RMSNorm for stable normalization across layers. The attention mechanism features Grouped Query Attention (GQA), optimizing computational efficiency by sharing key and value projections across multiple query heads. Specifically, the 7B variant utilizes 28 attention heads for queries and 4 for key/value pairs, distributed across 28 layers. This configuration facilitates efficient processing of long sequences.

The Qwen2.5-7B model is suitable for pretraining, providing a base for developers to build upon through further training stages such as Supervised Fine-Tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF). While it is a base model, the Qwen2.5 family exhibits enhanced capabilities in areas such as coding and mathematics, benefiting from specialized expert models. It also demonstrates improved proficiency in instruction following, processing structured data, and generating extended text outputs, including formatted data like JSON. The model's capacity to handle context lengths up to 131,072 tokens supports the processing of substantially long inputs.

关于 Qwen2.5

Qwen2.5 by Alibaba is a family of dense, decoder-only language models available in various sizes, with some variants utilizing Mixture-of-Experts. These models are pretrained on large-scale datasets, supporting extended context lengths and multilingual communication. The family includes specialized models for coding, mathematics, and multimodal tasks, such as vision and audio processing.


其他 Qwen2.5 模型

评估基准

排名适用于本地LLM。

排名

#41

基准分数排名

0.58

8

0.64

9

Professional Knowledge

MMLU Pro

0.56

20

0.34

23

Graduate-Level QA

GPQA

0.36

25

0.22

26

0.37

26

0.42

27

General Knowledge

MMLU

0.36

31

排名

排名

#41

编程排名

#22

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU