Gemma 2 2B: Specifications and GPU VRAM Requirements

Gemma 2 2B

闭源

开放权重

参数

上下文长度

8.192K

模态

Text

架构

Dense

许可证

Gemma License

发布日期

27 Jun 2024

训练数据截止日期

Jun 2024

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

2048

层数

注意力头

键值头

激活函数

GELU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Gemma 2 2B

Gemma 2 2B is a compact, state-of-the-art open language model developed by Google, drawing upon the same foundational research and technology employed in the Gemini model series. This model is engineered as a text-to-text, decoder-only transformer, and is provided in English, with both pre-trained and instruction-tuned variants featuring openly accessible weights. Its design prioritizes efficiency, enabling deployment across a spectrum of computing environments, from resource-constrained edge devices and consumer-grade laptops to more robust cloud infrastructures. This accessibility fosters broader participation in the development and application of advanced artificial intelligence systems.

The architectural framework of Gemma 2 2B is rooted in a decoder-only transformer design, incorporating several established and innovative components. Key architectural elements, consistent with the predecessor Gemma models, include a standard context length of 8192 tokens and the utilization of Rotary Position Embeddings (RoPE) for handling positional information. The model employs an approximated GeGLU non-linearity for its activation functions. Notable enhancements in Gemma 2 include a hybrid normalization approach, integrating both post-normalization and pre-normalization with RMSNorm to enhance training stability and overall performance. Furthermore, Gemma 2 2B utilizes Grouped-Query Attention (GQA), an optimized attention mechanism where multiple query heads share a single key and value head, contributing to improved computational efficiency during inference. Specifically, the 2B variant implements Multi-Query Attention (MQA) with a single key-value head, a configuration effective at smaller model scales. The training methodology for the 2B model also incorporates knowledge distillation from larger models, facilitating superior performance relative to its parameter count. Additionally, the model alternates between local sliding window attention and global attention across its layers to effectively capture both short-range dependencies and broader contextual relationships. Logit soft-capping is applied in the attention and final layers to further stabilize the training process.

The design of Gemma 2 2B emphasizes efficient operation, making it particularly well-suited for deployment in environments with limited computational resources. Its capabilities extend to a variety of text generation applications, encompassing tasks such as question answering, text summarization, and logical reasoning. The model's compact footprint makes it a viable solution for integration into mobile AI applications and edge computing scenarios. To promote responsible AI development, Gemma 2 2B is augmented with advanced safety features, including the ShieldGemma classifiers, designed to detect and mitigate harmful content, and Gemma Scope, a tool for enhancing transparency in the model's decision-making processes.

关于 Gemma 2

Gemma 2 is Google's family of open large language models, offering 2B, 9B, and 27B parameter sizes. Built upon the Gemma architecture, it incorporates innovations such as interleaved local and global attention, logit soft-capping for training stability, and Grouped Query Attention for inference efficiency. The smaller models leverage knowledge distillation.

其他 Gemma 2 模型

评估基准

排名适用于本地LLM。

没有可用的 Gemma 2 2B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

所需显存:

资源

官方文档发布说明阅读论文下载权重

Gemma 2 2B

技术规格

系统要求

Gemma 2 2B

关于 Gemma 2

其他 Gemma 2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源