Gemma 1 2B: Specifications and GPU VRAM Requirements

Gemma 1 2B

Closed Source

Open Weights

Parameters

Context Length

8.192K

Modality

Text

Architecture

Dense

License

Gemma Terms of Use

Release Date

21 Feb 2024

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Query Attention

Hidden Dimension Size

2048

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Gemma 1 2B

Gemma 1 2B is a lightweight, state-of-the-art open language model developed by Google, stemming from the same research and technology that underpins the Gemini family of models. This model is designed as a text-to-text, decoder-only transformer, primarily available in English, with both pre-trained and instruction-tuned variants. Its architectural design focuses on efficiency, making it suitable for deployment in environments with limited computational resources, such as laptops, desktops, or personal cloud infrastructure.

Architecturally, Gemma 1 2B incorporates several advanced components. It utilizes Multi-Query Attention (MQA) with a single key-value head, a design choice that optimizes for faster inference by sharing key and value projections across attention heads. Positional encoding is handled through Rotary Positional Embeddings (RoPE). The model's non-linear activation function is GeGLU (Gated Linear Unit), a variant of GLU that enhances expressive power. Normalization within the network is performed using RMSNorm. These elements contribute to the model's performance while maintaining a compact footprint.

The 2B variant is well-suited for a variety of text generation applications, including question answering, summarization, and reasoning tasks. The instruction-tuned versions of Gemma 1 2B are specifically refined to follow instructions effectively and engage in multi-turn conversations, making them adaptable for interactive applications like chatbots. Its compact size ensures it can operate on consumer-grade hardware, democratizing access to advanced AI capabilities for developers and researchers.

About Gemma 1

Gemma 1 is a family of lightweight, decoder-only transformer models from Google, available in 2B and 7B parameter sizes. Designed for various text generation tasks, they incorporate rotary positional embeddings, shared input/output embeddings, GEGLU activation, and RMSNorm. The 2B model uses multi-query attention, while 7B uses multi-head attention.

Other Gemma 1 Models

Gemma 1 7B

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Gemma 1 2B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code