ApX logoApX logo

Gemma 4 12B

Parameters

11.95B

Context Length

262.144K

Modality

Multimodal

Architecture

Dense

License

Apache-2.0

Release Date

3 Jun 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

16

Key-Value Heads

8

Attention Head Dimension

256

Position Embedding

Absolute Position Embedding

RoPE Theta

10,000

Sliding Window Attention

Yes

Sliding Window Size

1,024

Normalization

RMS Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

3,840

Number of Layers

48

FFN Intermediate Size (Dense)

15,360

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

262,144

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 3.8k · Context: 262.1k · Vocab: 262.1kx 48 layersRMSNormPre-AttentionMulti-Head Attention16Q / 8KV heads · SW: 1kHead dim: 256+RMSNormPre-FFNFeed-Forward NetworkGELUIntermediate: 15.4k+Final RMSNormOutput Logits

Gemma 4 12B

Google DeepMind's 12B dense open-weights model released June 3, 2026, bridging the gap between the edge-friendly E4B and the more advanced 26B MoE. Uniquely features an encoder-free unified architecture that projects raw image patches and audio waveforms directly into the LLM embedding space through lightweight linear layers, eliminating the latency and memory overhead of separate encoders. Supports 256K token context, native text/image/audio inputs, configurable thinking mode, and runs on consumer laptops with 16GB of RAM.

About Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.


Other Gemma 4 Models

Evaluation Benchmarks

No evaluation benchmarks for Gemma 4 12B available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
128k
256k

VRAM Required:

Recommended GPUs

Gemma 4 12B: Specifications and GPU VRAM Requirements