Parameters
2B
Context Length
8.192K
Modality
Text
Architecture
Dense
License
Gemma Terms of Use
Release Date
21 Feb 2024
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Query Attention
Attention Heads
16
Key-Value Heads
1
Attention Head Dimension
-
Position Embedding
ROPE
RoPE Theta
-
Sliding Window Attention
-
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
-
Dimensions
Hidden Dimension Size
2,048
Number of Layers
18
FFN Intermediate Size (Dense)
-
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
-
Gemma 1 2B is a lightweight, state-of-the-art open language model developed by Google, stemming from the same research and technology that underpins the Gemini family of models. This model is designed as a text-to-text, decoder-only transformer, primarily available in English, with both pre-trained and instruction-tuned variants. Its architectural design focuses on efficiency, making it suitable for deployment in environments with limited computational resources, such as laptops, desktops, or personal cloud infrastructure.
Architecturally, Gemma 1 2B incorporates several advanced components. It utilizes Multi-Query Attention (MQA) with a single key-value head, a design choice that optimizes for faster inference by sharing key and value projections across attention heads. Positional encoding is handled through Rotary Positional Embeddings (RoPE). The model's non-linear activation function is GeGLU (Gated Linear Unit), a variant of GLU that enhances expressive power. Normalization within the network is performed using RMSNorm. These elements contribute to the model's performance while maintaining a compact footprint.
The 2B variant is well-suited for a variety of text generation applications, including question answering, summarization, and reasoning tasks. The instruction-tuned versions of Gemma 1 2B are specifically refined to follow instructions effectively and engage in multi-turn conversations, making them adaptable for interactive applications like chatbots. Its compact size ensures it can operate on consumer-grade hardware, democratizing access to advanced AI capabilities for developers and researchers.
Gemma 1 is a family of lightweight, decoder-only transformer models from Google, available in 2B and 7B parameter sizes. Designed for various text generation tasks, they incorporate rotary positional embeddings, shared input/output embeddings, GEGLU activation, and RMSNorm. The 2B model uses multi-query attention, while 7B uses multi-head attention.
No evaluation benchmarks for Gemma 1 2B available.
Overall Rank
-
Coding Rank
-
Total Score
65
/ 100
Gemma 1 2B exhibits strong transparency in its architectural design and tokenizer implementation, backed by a detailed technical report. However, it suffers from significant opacity regarding its training dataset composition and the specific compute resources consumed during development. While the model is highly consistent in its identity, its custom licensing and initial issues with benchmark reproducibility present hurdles for fully transparent independent verification.
Architectural Provenance
Gemma 1 2B is extensively documented in the official technical report ('Gemma: Open Models Based on Gemini Research and Technology'). The architecture is explicitly defined as a decoder-only transformer with 18 layers, a hidden dimension of 2048, and 8 attention heads. It uniquely utilizes Multi-Query Attention (MQA) for the 2B variant, distinct from the 7B's Multi-Head Attention. Key modifications like RoPE embeddings, GeGLU activations, and RMSNorm are clearly stated and justified. The relationship to the Gemini family is transparent, though the specific 'distillation' or training recipe details from the larger Gemini models are described at a high level rather than with full procedural reproducibility.
Dataset Composition
While Google discloses the total token count (2 trillion for the 2B model) and general categories (web documents, mathematics, and code), it fails to provide a specific percentage breakdown or name the exact datasets used. The documentation mentions filtering for CSAM, PII, and quality using model-based classifiers, but these methodologies are not public. The lack of specific data sources or a detailed composition breakdown (e.g., 'StackOverflow: 5%') prevents independent verification of the training data's diversity or bias.
Tokenizer Integrity
The tokenizer is a SentencePiece-based model with a large vocabulary of 256,128 tokens, which is publicly accessible via Hugging Face and the official GitHub repository. Documentation specifies technical details such as digit splitting, byte-level encoding for unknown tokens, and the preservation of whitespace. The vocabulary size and tokenization approach are consistent across official documentation and third-party implementations like Transformers and vLLM.
Parameter Density
The model is marketed as '2B', but technical documentation reveals the actual parameter count is approximately 2.5 billion. While the 'active' vs 'total' distinction is not applicable here as it is a dense model, the discrepancy between the marketing name and the actual size is documented in the technical report (Table 1). The architectural breakdown (layers, heads, embedding dimensions) is fully transparent, allowing for precise parameter calculation by researchers.
Training Compute
Google discloses the hardware used (TPUv5e) and the scale (512 TPUv5e chips across 2 pods for the 2B model). However, it does not provide the total training duration in hours, the total energy consumption, or the carbon footprint. While the 'Pathways' approach and sharding techniques are mentioned, the lack of specific compute-time metrics or environmental impact data results in a low score for this pillar.
Benchmark Reproducibility
Google provides a wide array of benchmark results (MMLU, GSM8K, HumanEval, etc.) in the technical report. However, the exact prompts, few-shot examples, and evaluation code were not initially released in a centralized, reproducible format. Third-party audits (e.g., Unsloth) discovered significant discrepancies and bugs in the initial release's implementation of the technical report's specifications (such as BOS token requirements and RoPE precision), which hindered immediate reproducibility. The score is further adjusted due to documented evidence of benchmark contamination in the training data.
Identity Consistency
Gemma 1 2B demonstrates high identity consistency. It correctly identifies itself as a model developed by Google and is transparent about its versioning (e.g., distinguishing between 1.0 and 1.1). There are no significant reports of the model claiming to be a competitor's product (like GPT-4). It maintains a clear boundary regarding its capabilities as a text-only model compared to the multimodal Gemini models.
License Clarity
The model is released under the 'Gemma Terms of Use,' which is a custom 'open weights' license rather than a standard OSI-approved open-source license like Apache 2.0. While it allows for commercial use and redistribution, it includes restrictive clauses regarding 'Model Derivatives' and a 'Prohibited Use Policy' that Google can enforce remotely. The terms are legally clear but create a 'viral' effect where any model trained on Gemma output must also follow these terms, leading to some community ambiguity.
Hardware Footprint
Hardware requirements are well-documented by both Google and the community. Official model cards provide VRAM estimates for different precisions (e.g., ~4.7GB for BF16), and third-party tools like the Hugging Face Model Memory Utility provide granular data for quantization (e.g., ~1.2GB for INT4). The impact of context length (8k tokens) on memory is also publicly verifiable through standard transformer memory scaling formulas.
Versioning Drift
Google maintains a release log and uses version numbers (1.0, 1.1). However, the transition from 1.0 to 1.1 involved significant 'silent' changes in behavior due to a new RLHF method and bug fixes that were not fully detailed in a technical changelog. While the previous versions remain accessible, the lack of a detailed, line-by-line changelog for weights and alignment updates makes tracking drift difficult for developers.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online