ApX logo

Gemma 3 1B

Parameters

1B

Context Length

32.768K

Modality

Text

Architecture

Dense

License

Gemma License

Release Date

12 Mar 2025

Knowledge Cutoff

Aug 2024

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

1536

Number of Layers

26

Attention Heads

16

Key-Value Heads

4

Activation Function

-

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Gemma 3 1B

Gemma 3 1B is a small language model (SLM) within the Gemma 3 family, developed by Google, designed for efficient deployment and operation on resource-constrained devices such as mobile phones and web applications. This model aims to enable local execution of AI capabilities, addressing concerns related to user data privacy and cloud inference costs. Its architecture is derived from the same research and technology that underpins the Gemini series of models, emphasizing state-of-the-art performance within a compact footprint.

Architecturally, Gemma 3 1B employs a decoder-only transformer design, which is optimized for autoregressive tasks such as text generation. A notable innovation in Gemma 3 is its interleaved attention mechanism, which integrates both global and local attention layers to enhance contextual comprehension across extended sequences. This allows the model to process longer documents by maintaining overall coherence while preserving fine-grained details within smaller sections. The 1B variant features a context window of 32,000 tokens, enabling it to handle substantial textual inputs. It utilizes a SentencePiece tokenizer with 262,000 entries and supports over 140 languages, facilitating diverse linguistic applications. Unlike its larger Gemma 3 counterparts, the 1B model is specialized for text-only processing and does not incorporate multimodal capabilities.

Gemma 3 1B is engineered for high throughput, demonstrating the capacity to process up to 2585 tokens per second, which enables rapid content processing. It is optimized for various hardware platforms, including NVIDIA GPUs, Google Cloud TPUs, and AMD GPUs, ensuring broad compatibility. The model can operate effectively on devices with minimal memory, such as those with 4GB of RAM. Practical applications for Gemma 3 1B include generating descriptions from application data, creating context-aware dialogue for interactive characters, suggesting contextually relevant responses in messaging applications, and supporting question-answering systems for lengthy documents through integration with technologies like the AI Edge RAG SDK. It is provided with open weights, allowing developers to fine-tune and deploy it for specific project requirements.

About Gemma 3

Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.


Other Gemma 3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#52

BenchmarkScoreRank

Professional Knowledge

MMLU Pro

0.15

27

Graduate-Level QA

GPQA

0.19

30

General Knowledge

MMLU

0.19

38

Rankings

Overall Rank

#52

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
16k
32k

VRAM Required:

Recommended GPUs

Gemma 3 1B: Specifications and GPU VRAM Requirements