Qwen2.5-1.5B

Open Source

Open Weights

Parameters

1.5B

Context Length

128K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

19 Sept 2024

Knowledge Cutoff

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

4.76 GB VRAM

Consumer

1x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

128,000 tokens

17.86 GB VRAM

Consumer

1x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

No evaluation benchmarks for Qwen2.5-1.5B available.

Rankings

Overall Rank

Coding Rank

About Qwen2.5-1.5B

Qwen2.5-1.5B is a foundational large language model developed by Alibaba Cloud, forming part of the Qwen2.5 series. This model, with 1.54 billion parameters, is engineered for efficient processing and generation of human-like text across a diverse range of applications. It has undergone extensive pre-training on a large-scale dataset, encompassing up to 18 trillion tokens, and has been fine-tuned for specialized tasks such as instruction following, coding, and mathematical problem-solving. Its design emphasizes the ability to handle long contexts and generate coherent, accurate responses, making it suitable for various textual processing needs.

The architectural foundation of Qwen2.5-1.5B is a dense, decoder-only Transformer. Key components of its architecture include Rotary Position Embeddings (RoPE) for encoding positional information, SwiGLU as the activation function, and RMSNorm for effective normalization, which contribute to stable training and improved performance. The model incorporates Grouped Query Attention (GQA) with a specific configuration of 12 query heads and 2 key-value heads, facilitating efficient attention mechanisms. The model comprises 28 layers, with a hidden dimension size of 1536.

Qwen2.5-1.5B is designed to support a maximum context length of 128,000 tokens, with common configurations supporting 32,768 tokens for full context and enabling generation of up to 8,192 tokens. Its capabilities extend to multilingual understanding and generation across more than 29 languages. The model demonstrates proficiency in processing structured data formats such as tables and JSON. Practical use cases for Qwen2.5-1.5B include the development of conversational agents, virtual assistants, automated code generation tools, mathematical problem-solving platforms, and applications requiring robust content creation and summarization capabilities.

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

Position Embedding

ROPE

RoPE Theta

1,000,000

Sliding Window Attention

Sliding Window Size

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

1,536

Number of Layers

FFN Intermediate Size (Dense)

8,960

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

151,936

Model Integrity

Total Score

69 / 100

Upstream

21.5 / 30

Model

24.5 / 40

Downstream

22.5 / 30

Qwen2.5-1.5B Model Integrity Report

Total Score

/ 100

Audit Note

Qwen2.5-1.5B exhibits strong transparency in its architectural specifications and licensing, utilizing a standard Apache 2.0 license and providing detailed structural data. However, it remains opaque regarding its specific training data sources and the total compute resources used for development. While benchmark performance is high, the lack of fully reproducible evaluation pipelines and emerging concerns over data overlap in specific domains suggest a need for more rigorous independent verification.

Upstream

21.5 / 30

Architectural Provenance

8.0 / 10

The model's architecture is extensively documented in the Qwen2.5 Technical Report and official Hugging Face model cards. It is a dense, decoder-only Transformer utilizing Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, and Grouped Query Attention (GQA). Specific configurations for the 1.5B variant are provided, including 28 layers, a hidden dimension of 1536, and a GQA setup with 12 query heads and 2 key-value heads. The transition from the Qwen2 base is clearly stated, though the exact pre-training recipe (e.g., specific learning rate schedules or optimizer hyperparameters for this specific variant) is less detailed than the general series documentation.

Dataset Composition

4.5 / 10

While Alibaba discloses that the model was trained on a massive 18 trillion token dataset (an increase from 7 trillion in Qwen2), the specific composition breakdown is vague. Documentation mentions general categories like 'web data', 'code', and 'mathematics' and notes the inclusion of 29+ languages. However, it lacks a precise percentage breakdown (e.g., web: X%, code: Y%) or a list of specific data sources. The use of synthetic data is acknowledged, particularly for the Coder and Math variants, but the exact ratio for the base 1.5B model remains undisclosed.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly accessible via the 'qwen.tiktoken' file and Hugging Face's 'tokenization_qwen2.py'. It uses Byte Pair Encoding (BPE) with a large vocabulary of 151,646 tokens, which is well-documented for its efficiency across multiple languages (English, Chinese, etc.). The vocabulary size and special tokens (like <|endoftext|>) are clearly defined, and the alignment with the model's multilingual claims is verifiable through the public code and configuration files.

Model

24.5 / 40

Parameter Density

8.5 / 10

The parameter count is precisely stated as 1.54 billion total and 1.31 billion non-embedding parameters. As a dense model, all parameters are active during inference, avoiding the 'active vs total' ambiguity found in MoE models. The architectural breakdown (layers, heads, hidden dims) is fully transparent in the technical report and model config files, allowing for a clear understanding of parameter distribution.

Training Compute

2.5 / 10

Information regarding the training compute is extremely limited. While the technical report mentions the scale of the data (18T tokens), it does not disclose the specific hardware used (e.g., number of H100/A100 GPUs), the total GPU hours consumed, or the estimated carbon footprint. Some third-party research has attempted to estimate the energy footprint for inference, but official training compute metrics are conspicuously absent for 'competitive reasons'.

Benchmark Reproducibility

4.0 / 10

Alibaba provides comprehensive benchmark results across standard sets like MMLU, HumanEval, and MATH in their technical report. However, the score is moderated because the exact evaluation prompts, few-shot examples, and specific code used to generate these scores are not fully public in a single reproducible repository. While some evaluation scripts are available in the QwenLM GitHub, they do not cover the full breadth of the reported results, making exact third-party reproduction difficult.

Identity Consistency

9.5 / 10

The model demonstrates high identity consistency, correctly identifying itself as part of the Qwen series in most standard deployments. It maintains clear versioning (2.5) and distinguishes between its base and 'Instruct' variants. There are no significant reports of the model claiming to be a competitor's product (e.g., GPT-4) or denying its nature as an AI developed by Alibaba.

Downstream

22.5 / 30

License Clarity

9.0 / 10

The Qwen2.5-1.5B model is explicitly released under the Apache 2.0 license, which is a highly permissive, standard open-source license. This is clearly stated on the Hugging Face repository and the official blog. This marks a transparent shift from earlier versions that used the more restrictive 'Tongyi Qianwen License', providing clear terms for both commercial and research use.

Hardware Footprint

7.5 / 10

Hardware requirements are well-documented by both the official team and the community. Official documentation provides VRAM estimates for different context lengths and quantization levels (FP16, INT8, INT4). For example, the 1.5B model is noted to require ~3-4GB VRAM for FP16 inference. Third-party tools like vLLM and Ollama further validate these requirements, though official documentation on the specific accuracy-performance trade-offs of quantization is less detailed.

Versioning Drift

6.0 / 10

Alibaba uses a clear versioning system (Qwen -> Qwen1.5 -> Qwen2 -> Qwen2.5). However, the score is limited because detailed changelogs for minor weight updates or 'silent' refinements are not always provided. While major releases are well-documented, the community has noted occasional issues with EOS tokens and chat templates in base models that required manual fixes, indicating some gaps in the official versioning and release verification process.

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code

About Qwen2.5

Qwen2.5 by Alibaba is a family of dense, decoder-only language models available in various sizes, with some variants utilizing Mixture-of-Experts. These models are pretrained on large-scale datasets, supporting extended context lengths and multilingual communication. The family includes specialized models for coding, mathematics, and multimodal tasks, such as vision and audio processing.

Qwen2.5-1.5B

System Requirements

Architecture Diagram

Evaluation Benchmarks

Rankings

About Qwen2.5-1.5B

Technical Specifications

Model Integrity

Qwen2.5-1.5B Model Integrity Report

Audit Note

Upstream

Model

Downstream

Resources

About Qwen2.5

Other Qwen2.5 Models