ApX 标志

趋近智

Qwen2.5-14B

参数

14B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

19 Sept 2024

知识截止

-

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

5120

层数

40

注意力头

80

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen2.5-14B

Qwen2.5-14B is a large language model developed by the Qwen Team at Alibaba Cloud, part of the Qwen2.5 model series. It is a dense, decoder-only transformer model designed for a broad range of natural language processing tasks. The model serves as a foundational component for developers and researchers, providing a scalable base that can be further fine-tuned for specific applications. Qwen2.5-14B supports multilingual contexts, capable of understanding and generating text in over 29 languages.

The Qwen2.5-14B architecture is built upon a transformer backbone, incorporating several advanced components to enhance its capabilities. It utilizes Rotary Position Embeddings (RoPE) for effective handling of sequence length, the SwiGLU activation function for improved non-linearity, and RMSNorm for efficient layer normalization. The model employs Grouped Query Attention (GQA) with a configuration of 40 query heads and 8 key/value heads, optimizing attention mechanisms for reduced memory bandwidth during inference. Comprising 48 layers, the model is architecturally designed for computational efficiency and performance across diverse tasks.

Qwen2.5-14B is pretrained on an extensive dataset of up to 18 trillion tokens, enabling it to demonstrate proficiency in areas such as logical reasoning, coding, and mathematical tasks. The model supports an extended context window of up to 131,072 tokens, facilitating the processing of long documents and complex inputs. While the base Qwen2.5-14B model is intended for pre-training and subsequent fine-tuning, its instruction-tuned variants are optimized for direct application in conversational AI, instruction following, and generating structured outputs like JSON. Its design accommodates applications requiring significant context and precise text generation.

关于 Qwen2.5

Qwen2.5 by Alibaba is a family of dense, decoder-only language models available in various sizes, with some variants utilizing Mixture-of-Experts. These models are pretrained on large-scale datasets, supporting extended context lengths and multilingual communication. The family includes specialized models for coding, mathematics, and multimodal tasks, such as vision and audio processing.


其他 Qwen2.5 模型

评估基准

排名适用于本地LLM。

排名

#20

基准分数排名

0.69

🥈

2

0.69

5

Professional Knowledge

MMLU Pro

0.64

18

Graduate-Level QA

GPQA

0.46

19

General Knowledge

MMLU

0.46

27

排名

排名

#20

编程排名

#5

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU