趋近智
参数
2B
上下文长度
8.192K
模态
Text
架构
Dense
许可证
Gemma License
发布日期
27 Jun 2024
知识截止
Jun 2024
注意力结构
Grouped-Query Attention
隐藏维度大小
2048
层数
26
注意力头
16
键值头
4
激活函数
GELU
归一化
RMS Normalization
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Gemma 2 2B is a compact, state-of-the-art open language model developed by Google, drawing upon the same foundational research and technology employed in the Gemini model series. This model is engineered as a text-to-text, decoder-only transformer, and is provided in English, with both pre-trained and instruction-tuned variants featuring openly accessible weights. Its design prioritizes efficiency, enabling deployment across a spectrum of computing environments, from resource-constrained edge devices and consumer-grade laptops to more robust cloud infrastructures. This accessibility fosters broader participation in the development and application of advanced artificial intelligence systems.
The architectural framework of Gemma 2 2B is rooted in a decoder-only transformer design, incorporating several established and innovative components. Key architectural elements, consistent with the predecessor Gemma models, include a standard context length of 8192 tokens and the utilization of Rotary Position Embeddings (RoPE) for handling positional information. The model employs an approximated GeGLU non-linearity for its activation functions. Notable enhancements in Gemma 2 include a hybrid normalization approach, integrating both post-normalization and pre-normalization with RMSNorm to enhance training stability and overall performance. Furthermore, Gemma 2 2B utilizes Grouped-Query Attention (GQA), an optimized attention mechanism where multiple query heads share a single key and value head, contributing to improved computational efficiency during inference. Specifically, the 2B variant implements Multi-Query Attention (MQA) with a single key-value head, a configuration effective at smaller model scales. The training methodology for the 2B model also incorporates knowledge distillation from larger models, facilitating superior performance relative to its parameter count. Additionally, the model alternates between local sliding window attention and global attention across its layers to effectively capture both short-range dependencies and broader contextual relationships. Logit soft-capping is applied in the attention and final layers to further stabilize the training process.
The design of Gemma 2 2B emphasizes efficient operation, making it particularly well-suited for deployment in environments with limited computational resources. Its capabilities extend to a variety of text generation applications, encompassing tasks such as question answering, text summarization, and logical reasoning. The model's compact footprint makes it a viable solution for integration into mobile AI applications and edge computing scenarios. To promote responsible AI development, Gemma 2 2B is augmented with advanced safety features, including the ShieldGemma classifiers, designed to detect and mitigate harmful content, and Gemma Scope, a tool for enhancing transparency in the model's decision-making processes.
Gemma 2 is Google's family of open large language models, offering 2B, 9B, and 27B parameter sizes. Built upon the Gemma architecture, it incorporates innovations such as interleaved local and global attention, logit soft-capping for training stability, and Grouped Query Attention for inference efficiency. The smaller models leverage knowledge distillation.
排名适用于本地LLM。
没有可用的 Gemma 2 2B 评估基准。