ApX logoApX logo

Gemma 3 1B

Parameters

1B

Context Length

33K

Modality

Text

Architecture

Dense

License

Gemma License

Release Date

12 Mar 2025

Knowledge Cutoff

Aug 2024

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

16

Key-Value Heads

4

Attention Head Dimension

-

Position Embedding

ROPE

RoPE Theta

-

Sliding Window Attention

-

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

-

Dimensions

Hidden Dimension Size

1,536

Number of Layers

26

FFN Intermediate Size (Dense)

-

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

-

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 1.5k · Context: 33Kx 26 layersRMSNormPre-AttentionGrouped-Query Attention16Q / 4KV headsHead dim: 96+RMSNormPre-FFNFeed-Forward NetworkActivation+Final RMSNormOutput Logits

Gemma 3 1B

Gemma 3 1B is a small language model (SLM) within the Gemma 3 family, developed by Google, designed for efficient deployment and operation on resource-constrained devices such as mobile phones and web applications. This model aims to enable local execution of AI capabilities, addressing concerns related to user data privacy and cloud inference costs. Its architecture is derived from the same research and technology that underpins the Gemini series of models, emphasizing state-of-the-art performance within a compact footprint.

Architecturally, Gemma 3 1B employs a decoder-only transformer design, which is optimized for autoregressive tasks such as text generation. A notable innovation in Gemma 3 is its interleaved attention mechanism, which integrates both global and local attention layers to enhance contextual comprehension across extended sequences. This allows the model to process longer documents by maintaining overall coherence while preserving fine-grained details within smaller sections. The 1B variant features a context window of 32,000 tokens, enabling it to handle substantial textual inputs. It utilizes a SentencePiece tokenizer with 262,000 entries and supports over 140 languages, facilitating diverse linguistic applications. Unlike its larger Gemma 3 counterparts, the 1B model is specialized for text-only processing and does not incorporate multimodal capabilities.

Gemma 3 1B is engineered for high throughput, demonstrating the capacity to process up to 2585 tokens per second, which enables rapid content processing. It is optimized for various hardware platforms, including NVIDIA GPUs, Google Cloud TPUs, and AMD GPUs, ensuring broad compatibility. The model can operate effectively on devices with minimal memory, such as those with 4GB of RAM. Practical applications for Gemma 3 1B include generating descriptions from application data, creating context-aware dialogue for interactive characters, suggesting contextually relevant responses in messaging applications, and supporting question-answering systems for lengthy documents through integration with technologies like the AI Edge RAG SDK. It is provided with open weights, allowing developers to fine-tune and deploy it for specific project requirements.

About Gemma 3

Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.


Other Gemma 3 Models

Evaluation Benchmarks

No evaluation benchmarks for Gemma 3 1B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B

69 / 100

Gemma 3 1B Model Integrity Report

Total Score

69

/ 100

B

Audit Note

Gemma 3 1B exhibits strong transparency in its architectural design and hardware requirements, supported by a detailed technical report and clear deployment guidelines. However, it remains opaque regarding the specific composition of its 2-trillion-token training dataset and the total compute resources consumed during training. The use of a custom license and limited public evaluation code further prevents it from reaching the highest transparency tier.

Upstream

22.0 / 30

Architectural Provenance

8.5 / 10

Gemma 3 1B is extensively documented in the official technical report (arXiv:2503.19786), which details its decoder-only transformer architecture. It specifies a 5:1 interleaving of local (sliding window) and global attention layers to manage KV-cache efficiency, a 32K context window, and the use of Grouped-Query Attention (GQA) with QK-norm. The training methodology, including distillation from larger teacher models and post-training via RLHF, RLMF, and RLEF, is clearly described.

Dataset Composition

4.5 / 10

While Google discloses that the model was trained on 2 trillion tokens and provides general categories (web documents, code, mathematics), it lacks a precise percentage breakdown of the dataset composition. The documentation mentions support for 140+ languages and describes filtering processes for CSAM and sensitive data, but the specific data sources and their proportions remain proprietary, falling short of high-transparency standards.

Tokenizer Integrity

9.0 / 10

The model uses the same SentencePiece tokenizer as Gemini 2.0, which is publicly available and well-documented. It features a vocabulary size of 262,144 entries, optimized for multilingual support across 140+ languages. Technical details such as byte-level encodings, split digits, and preserved whitespace are explicitly stated in the technical report and verifiable via the Hugging Face model repository.

Model

25.5 / 40

Parameter Density

7.5 / 10

The model is explicitly identified as a dense 1.0B parameter model. Technical documentation provides a clear architectural breakdown, including the number of layers (26) and hidden dimensions (1152). Unlike MoE models, there is no ambiguity regarding active vs. total parameters, though detailed weight distribution across specific components (e.g., FFN vs. Attention) requires manual calculation from the provided config files.

Training Compute

4.0 / 10

Google discloses the hardware used (TPUv5e) and the scale of the cluster (512 chips), but it does not provide the total GPU/TPU hours, energy consumption, or a calculated carbon footprint for the training run. While the infrastructure type is known, the lack of specific resource duration or environmental impact data limits the score to the lower end of moderate transparency.

Benchmark Reproducibility

5.0 / 10

The technical report provides scores for standard benchmarks (MMLU, GSM8K, HumanEval, etc.) and specifies some evaluation settings (e.g., 0-shot vs. few-shot). However, the full evaluation code and exact prompts used for all benchmarks are not publicly released in a centralized repository, making exact third-party reproduction difficult without significant reverse-engineering of the described methodology.

Identity Consistency

9.0 / 10

Gemma 3 1B demonstrates strong identity consistency, correctly identifying its version and capabilities in official documentation and model cards. It is transparent about its text-only nature compared to the multimodal larger variants in the same family. There are no documented cases of the model misrepresenting itself as a competitor's product or denying its AI nature.

Downstream

21.0 / 30

License Clarity

6.5 / 10

The model is released under the 'Gemma Terms of Use,' which is a custom permissive license rather than a standard OSI-approved open-source license like Apache 2.0. While it allows for commercial use and redistribution, it includes specific 'Prohibited Use' policies and 'viral' clauses regarding model derivatives that create legal complexity for developers, distinguishing it from true open-source software.

Hardware Footprint

8.5 / 10

Hardware requirements are exceptionally well-documented. Google provides VRAM estimates for various precision levels (BF16: 1.5GB, SFP8: 1.1GB, Q4: 892MB) and explicitly mentions that the model can run on devices with as little as 4GB of RAM. The impact of Quantization-Aware Training (QAT) on maintaining accuracy while reducing footprint is also detailed in official blog posts and technical guides.

Versioning Drift

6.0 / 10

Google maintains a 'Gemma releases' page that tracks version history and release dates (e.g., March 12, 2025). However, it lacks a detailed, granular changelog for minor weight updates or specific documentation on performance drift over time. While major versions are clear, the transparency regarding incremental behavioral changes is moderate.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
16k
32k

VRAM Required:

Recommended GPUs