Parameters
72B
Context Length
32.768K
Modality
Text
Architecture
Dense
License
Tongyi Qianwen LICENSE AGREEMENT
Release Date
7 Jun 2024
Knowledge Cutoff
-
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
8192
Number of Layers
80
Attention Heads
128
Key-Value Heads
8
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Qwen2-72B is a significant iteration within the Qwen2 large language model series, developed by Alibaba. This model is engineered to handle a diverse array of natural language processing tasks, encompassing both comprehension and generation, alongside proficiency in coding and mathematical problem-solving. It functions as a foundational model, intended for further specialized fine-tuning to address particular application domains.
The architectural foundation of Qwen2-72B is the Transformer, augmented with several advancements to enhance computational efficiency and model performance. Key innovations include the adoption of the SwiGLU activation function and the implementation of Group Query Attention (GQA), which optimizes the attention mechanism for reduced memory footprint and accelerated inference. Furthermore, the model incorporates an enhanced tokenizer, designed to process a wide spectrum of natural languages and programming code effectively. Notably, Qwen2-72B maintains a dense model architecture, distinguishing it from Mixture-of-Experts (MoE) configurations found in other variants within the broader Qwen2 family.
From a functional perspective, Qwen2-72B demonstrates capabilities across multiple critical areas. It is designed to excel in tasks requiring sophisticated natural language understanding, robust language generation, and adeptness in coding and mathematical reasoning. While positioned as a base model, it provides a strong pre-trained foundation suitable for post-training methodologies such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). This design philosophy supports its application in scenarios demanding extensive multilingual understanding, complex code manipulation, or advanced mathematical computation.
The Alibaba Qwen2 model family comprises large language models built upon the Transformer architecture. It includes both dense and Mixture-of-Experts (MoE) variants, designed for diverse language tasks. Technical features include Grouped Query Attention and support for extended context lengths up to 131,072 tokens, optimizing memory footprint for inference.
Ranking is for Local LLMs.
Rank
#30
Benchmark | Score | Rank |
---|---|---|
Refactoring Aider Refactoring | 0.56 | 9 |
Coding Aider Coding | 0.56 | 12 |
Professional Knowledge MMLU Pro | 0.64 | 17 |
Graduate-Level QA GPQA | 0.42 | 21 |
General Knowledge MMLU | 0.42 | 28 |
Overall Rank
#30
Coding Rank
#21
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens