Parameters
7B
Context Length
131.072K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
7 Jun 2024
Knowledge Cutoff
Dec 2023
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
3584
Number of Layers
32
Attention Heads
64
Key-Value Heads
8
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Qwen2-7B is a decoder-only Transformer model developed by Alibaba Cloud, forming a part of the Qwen2 series of large language models. It is specifically designed as a foundational model, intended for diverse natural language processing applications, including comprehensive language understanding and generation tasks. While the base Qwen2-7B model is suitable for further post-training procedures such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), instruction-tuned variants are also available for direct deployment in instruction-following scenarios, supporting various conversational and task-oriented applications. The model's training dataset incorporates a wide array of languages, including English, Chinese, and 27 additional languages, thereby extending its utility and enabling robust multilingual capabilities.
The architectural design of Qwen2-7B integrates several technical features aimed at optimizing performance and efficiency. It utilizes SwiGLU activation functions within its feed-forward networks and incorporates attention QKV bias. A notable innovation across the Qwen2 suite is the implementation of Group Query Attention (GQA), which is designed to enhance inference speed and reduce memory consumption. Positional encoding is managed by Rotary Position Embedding (RoPE), with techniques like Yet Another RoPE Normalization (YaRN) employed to facilitate extrapolation to longer context lengths. Normalization layers within the model architecture employ RMSNorm. Additionally, the model benefits from an enhanced tokenizer, engineered for adaptability across a spectrum of natural languages and programming codes.
Qwen2-7B demonstrates the capacity for processing substantial input sequences. The base model supports a pretraining context length of 32,000 tokens, with extrapolation capabilities extending up to 128,000 tokens. Its instruction-tuned variant supports a context length of up to 131,072 tokens, enabling the model to manage and reason over extensive texts. This model is engineered to exhibit proficient performance across various cognitive domains, including natural language understanding, general question answering, text summarization, content creation, coding assistance, and mathematical problem-solving. The 7B model is widely utilized due to its ability to run on accelerators equipped with 16GB memory using 16-bit floating points. The Qwen2 series models are released under the Apache 2.0 license, supporting open research, development, and commercial use.
The Alibaba Qwen2 model family comprises large language models built upon the Transformer architecture. It includes both dense and Mixture-of-Experts (MoE) variants, designed for diverse language tasks. Technical features include Grouped Query Attention and support for extended context lengths up to 131,072 tokens, optimizing memory footprint for inference.
Overall Rank
#47
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens