趋近智
参数
1.5B
上下文长度
32.768K
模态
Text
架构
Dense
许可证
Apache 2.0
发布日期
7 Jun 2024
知识截止
Sep 2024
注意力结构
Grouped-Query Attention
隐藏维度大小
1536
层数
24
注意力头
32
键值头
8
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Qwen2-1.5B is a compact, decoder-only language model developed by the Qwen team at Alibaba Group. It is designed for efficient natural language processing tasks, striking a balance between performance and resource requirements. This model is a component of the broader Qwen2 series, which includes various model sizes and encompasses both base and instruction-tuned variants. Its purpose is to facilitate a wide array of applications that involve text generation, question answering, and comprehensive language understanding.
The architectural foundation of Qwen2-1.5B is the Transformer, incorporating several technical enhancements to optimize its operational characteristics. Key innovations include the integration of the SwiGLU activation function, the application of attention QKV bias, and the use of Group Query Attention (GQA). GQA contributes to more efficient inference processes and a reduced memory footprint during operation. The model also employs Rotary Positional Embeddings (RoPE) for handling positional information and utilizes RMSNorm for normalization. Furthermore, its tokenizer has undergone refinement, enabling adaptive processing of multiple natural languages and programming codes, which significantly expands its multilingual capabilities. Tied embeddings are used to enhance parameter efficiency within the model.
Regarding performance characteristics, Qwen2-1.5B exhibits robust capabilities across diverse language-centric tasks. It supports a context length of up to 32,768 tokens, allowing for the effective processing of extensive textual inputs. The model's functionalities span language understanding, text generation, code interpretation, mathematical problem-solving, and reasoning. Its design emphasizes efficiency and responsiveness, positioning it as a suitable selection for applications that necessitate rapid and reliable language processing across a multitude of languages.
The Alibaba Qwen2 model family comprises large language models built upon the Transformer architecture. It includes both dense and Mixture-of-Experts (MoE) variants, designed for diverse language tasks. Technical features include Grouped Query Attention and support for extended context lengths up to 131,072 tokens, optimizing memory footprint for inference.
排名适用于本地LLM。
没有可用的 Qwen2-1.5B 评估基准。