Qwen2-7B: Specifications and GPU VRAM Requirements

Qwen2-7B

开源

开放权重

参数

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

7 Jun 2024

训练数据截止日期

Dec 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

3584

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen2-7B

Qwen2-7B is a decoder-only Transformer model developed by Alibaba Cloud, forming a part of the Qwen2 series of large language models. It is specifically designed as a foundational model, intended for diverse natural language processing applications, including comprehensive language understanding and generation tasks. While the base Qwen2-7B model is suitable for further post-training procedures such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), instruction-tuned variants are also available for direct deployment in instruction-following scenarios, supporting various conversational and task-oriented applications. The model's training dataset incorporates a wide array of languages, including English, Chinese, and 27 additional languages, thereby extending its utility and enabling robust multilingual capabilities.

The architectural design of Qwen2-7B integrates several technical features aimed at optimizing performance and efficiency. It utilizes SwiGLU activation functions within its feed-forward networks and incorporates attention QKV bias. A notable innovation across the Qwen2 suite is the implementation of Group Query Attention (GQA), which is designed to enhance inference speed and reduce memory consumption. Positional encoding is managed by Rotary Position Embedding (RoPE), with techniques like Yet Another RoPE Normalization (YaRN) employed to facilitate extrapolation to longer context lengths. Normalization layers within the model architecture employ RMSNorm. Additionally, the model benefits from an enhanced tokenizer, engineered for adaptability across a spectrum of natural languages and programming codes.

Qwen2-7B demonstrates the capacity for processing substantial input sequences. The base model supports a pretraining context length of 32,000 tokens, with extrapolation capabilities extending up to 128,000 tokens. Its instruction-tuned variant supports a context length of up to 131,072 tokens, enabling the model to manage and reason over extensive texts. This model is engineered to exhibit proficient performance across various cognitive domains, including natural language understanding, general question answering, text summarization, content creation, coding assistance, and mathematical problem-solving. The 7B model is widely utilized due to its ability to run on accelerators equipped with 16GB memory using 16-bit floating points. The Qwen2 series models are released under the Apache 2.0 license, supporting open research, development, and commercial use.

关于 Qwen2

The Alibaba Qwen2 model family comprises large language models built upon the Transformer architecture. It includes both dense and Mixture-of-Experts (MoE) variants, designed for diverse language tasks. Technical features include Grouped Query Attention and support for extended context lengths up to 131,072 tokens, optimizing memory footprint for inference.

其他 Qwen2 模型

评估基准

排名适用于本地LLM。

排名

#38

基准	分数	排名
Professional Knowledge MMLU Pro	0.44	⭐ 4
Graduate-Level QA GPQA	0.25	27
General Knowledge MMLU	0.25	39

排名

#38

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Qwen2-7B

技术规格

系统要求

Qwen2-7B

关于 Qwen2

其他 Qwen2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源