Gemma 4 12B

Open Source

Open Weights

Parameters

11.95B

Context Length

262K

Modality

Multimodal

Architecture

Dense

License

Apache-2.0

Release Date

3 Jun 2026

Knowledge Cutoff

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

27.02 GB VRAM

Consumer

2x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

262,144 tokens

134.83 GB VRAM

Consumer

7x RTX 4090

24GB VRAM

Datacenter

2x NVIDIA A100

80GB VRAM

Apple Silicon

2x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

No evaluation benchmarks for Gemma 4 12B available.

Rankings

Overall Rank

Coding Rank

About Gemma 4 12B

Google DeepMind's 12B dense open-weights model released June 3, 2026, bridging the gap between the edge-friendly E4B and the more advanced 26B MoE. Uniquely features an encoder-free unified architecture that projects raw image patches and audio waveforms directly into the LLM embedding space through lightweight linear layers, eliminating the latency and memory overhead of separate encoders. Supports 256K token context, native text/image/audio inputs, configurable thinking mode, and runs on consumer laptops with 16GB of RAM.

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

256

Position Embedding

Absolute Position Embedding

RoPE Theta

10,000

Sliding Window Attention

Yes

Sliding Window Size

1,024

Sliding Window Ratio

83.3%

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

3,840

Number of Layers

FFN Intermediate Size (Dense)

15,360

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

262,144

Resources

Official Documentation Download Weights

About Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.