Qwen2.5-14B: Specifications and GPU VRAM Requirements

Qwen2.5-14B

Open Source

Open Weights

Parameters

14B

Context Length

131.072K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

19 Sept 2024

Knowledge Cutoff

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

5120

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Qwen2.5-14B

Qwen2.5-14B is a large language model developed by the Qwen Team at Alibaba Cloud, part of the Qwen2.5 model series. It is a dense, decoder-only transformer model designed for a broad range of natural language processing tasks. The model serves as a foundational component for developers and researchers, providing a scalable base that can be further fine-tuned for specific applications. Qwen2.5-14B supports multilingual contexts, capable of understanding and generating text in over 29 languages.

The Qwen2.5-14B architecture is built upon a transformer backbone, incorporating several advanced components to enhance its capabilities. It utilizes Rotary Position Embeddings (RoPE) for effective handling of sequence length, the SwiGLU activation function for improved non-linearity, and RMSNorm for efficient layer normalization. The model employs Grouped Query Attention (GQA) with a configuration of 40 query heads and 8 key/value heads, optimizing attention mechanisms for reduced memory bandwidth during inference. Comprising 48 layers, the model is architecturally designed for computational efficiency and performance across diverse tasks.

Qwen2.5-14B is pretrained on an extensive dataset of up to 18 trillion tokens, enabling it to demonstrate proficiency in areas such as logical reasoning, coding, and mathematical tasks. The model supports an extended context window of up to 131,072 tokens, facilitating the processing of long documents and complex inputs. While the base Qwen2.5-14B model is intended for pre-training and subsequent fine-tuning, its instruction-tuned variants are optimized for direct application in conversational AI, instruction following, and generating structured outputs like JSON. Its design accommodates applications requiring significant context and precise text generation.

About Qwen2.5

Qwen2.5 by Alibaba is a family of dense, decoder-only language models available in various sizes, with some variants utilizing Mixture-of-Experts. These models are pretrained on large-scale datasets, supporting extended context lengths and multilingual communication. The family includes specialized models for coding, mathematics, and multimodal tasks, such as vision and audio processing.

Other Qwen2.5 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#25

Benchmark	Score	Rank
Refactoring Aider Refactoring	0.69	🥈 2
Coding Aider Coding	0.69	6
Graduate-Level QA GPQA	0.46	18
General Knowledge MMLU	0.46	29
Professional Knowledge MMLU Pro	0.64	29

Rankings

Overall Rank

#25

Coding Rank

#1 🥇

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

64k

128k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights Source Code