Qwen2.5-14B: Specifications and GPU VRAM Requirements

Qwen2.5-14B

开源

开放权重

参数

14B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

19 Sept 2024

知识截止

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

5120

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen2.5-14B

Qwen2.5-14B is a large language model developed by the Qwen Team at Alibaba Cloud, part of the Qwen2.5 model series. It is a dense, decoder-only transformer model designed for a broad range of natural language processing tasks. The model serves as a foundational component for developers and researchers, providing a scalable base that can be further fine-tuned for specific applications. Qwen2.5-14B supports multilingual contexts, capable of understanding and generating text in over 29 languages.

The Qwen2.5-14B architecture is built upon a transformer backbone, incorporating several advanced components to enhance its capabilities. It utilizes Rotary Position Embeddings (RoPE) for effective handling of sequence length, the SwiGLU activation function for improved non-linearity, and RMSNorm for efficient layer normalization. The model employs Grouped Query Attention (GQA) with a configuration of 40 query heads and 8 key/value heads, optimizing attention mechanisms for reduced memory bandwidth during inference. Comprising 48 layers, the model is architecturally designed for computational efficiency and performance across diverse tasks.

Qwen2.5-14B is pretrained on an extensive dataset of up to 18 trillion tokens, enabling it to demonstrate proficiency in areas such as logical reasoning, coding, and mathematical tasks. The model supports an extended context window of up to 131,072 tokens, facilitating the processing of long documents and complex inputs. While the base Qwen2.5-14B model is intended for pre-training and subsequent fine-tuning, its instruction-tuned variants are optimized for direct application in conversational AI, instruction following, and generating structured outputs like JSON. Its design accommodates applications requiring significant context and precise text generation.

关于 Qwen2.5

Qwen2.5 by Alibaba is a family of dense, decoder-only language models available in various sizes, with some variants utilizing Mixture-of-Experts. These models are pretrained on large-scale datasets, supporting extended context lengths and multilingual communication. The family includes specialized models for coding, mathematics, and multimodal tasks, such as vision and audio processing.

其他 Qwen2.5 模型

评估基准

排名适用于本地LLM。

排名

#20

基准	分数	排名
Refactoring Aider Refactoring	0.69	🥈 2
Coding Aider Coding	0.69	5
Professional Knowledge MMLU Pro	0.64	18
Graduate-Level QA GPQA	0.46	19
General Knowledge MMLU	0.46	27

排名

#20

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档发布说明下载权重源代码

Qwen2.5-14B

技术规格

系统要求

Qwen2.5-14B

关于 Qwen2.5

其他 Qwen2.5 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源