Parameters
2B
Context Length
8.192K
Modality
Text
Architecture
Dense
License
Gemma License
Release Date
27 Jun 2024
Knowledge Cutoff
Jun 2024
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
2048
Number of Layers
26
Attention Heads
16
Key-Value Heads
4
Activation Function
GELU
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Gemma 2 2B is a compact, state-of-the-art open language model developed by Google, drawing upon the same foundational research and technology employed in the Gemini model series. This model is engineered as a text-to-text, decoder-only transformer, and is provided in English, with both pre-trained and instruction-tuned variants featuring openly accessible weights. Its design prioritizes efficiency, enabling deployment across a spectrum of computing environments, from resource-constrained edge devices and consumer-grade laptops to more robust cloud infrastructures. This accessibility fosters broader participation in the development and application of advanced artificial intelligence systems.
The architectural framework of Gemma 2 2B is rooted in a decoder-only transformer design, incorporating several established and innovative components. Key architectural elements, consistent with the predecessor Gemma models, include a standard context length of 8192 tokens and the utilization of Rotary Position Embeddings (RoPE) for handling positional information. The model employs an approximated GeGLU non-linearity for its activation functions. Notable enhancements in Gemma 2 include a hybrid normalization approach, integrating both post-normalization and pre-normalization with RMSNorm to enhance training stability and overall performance. Furthermore, Gemma 2 2B utilizes Grouped-Query Attention (GQA), an optimized attention mechanism where multiple query heads share a single key and value head, contributing to improved computational efficiency during inference. Specifically, the 2B variant implements Multi-Query Attention (MQA) with a single key-value head, a configuration effective at smaller model scales. The training methodology for the 2B model also incorporates knowledge distillation from larger models, facilitating superior performance relative to its parameter count. Additionally, the model alternates between local sliding window attention and global attention across its layers to effectively capture both short-range dependencies and broader contextual relationships. Logit soft-capping is applied in the attention and final layers to further stabilize the training process.
The design of Gemma 2 2B emphasizes efficient operation, making it particularly well-suited for deployment in environments with limited computational resources. Its capabilities extend to a variety of text generation applications, encompassing tasks such as question answering, text summarization, and logical reasoning. The model's compact footprint makes it a viable solution for integration into mobile AI applications and edge computing scenarios. To promote responsible AI development, Gemma 2 2B is augmented with advanced safety features, including the ShieldGemma classifiers, designed to detect and mitigate harmful content, and Gemma Scope, a tool for enhancing transparency in the model's decision-making processes.
Gemma 2 is Google's family of open large language models, offering 2B, 9B, and 27B parameter sizes. Built upon the Gemma architecture, it incorporates innovations such as interleaved local and global attention, logit soft-capping for training stability, and Grouped Query Attention for inference efficiency. The smaller models leverage knowledge distillation.
Ranking is for Local LLMs.
No evaluation benchmarks for Gemma 2 2B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens