Parameters
11.95B
Context Length
262.144K
Modality
Multimodal
Architecture
Dense
License
Apache-2.0
Release Date
3 Jun 2026
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Head Attention
Attention Heads
16
Key-Value Heads
8
Attention Head Dimension
256
Position Embedding
Absolute Position Embedding
RoPE Theta
10,000
Sliding Window Attention
Yes
Sliding Window Size
1,024
Normalization
RMS Normalization
Activation Function
GELU
Dimensions
Hidden Dimension Size
3,840
Number of Layers
48
FFN Intermediate Size (Dense)
15,360
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
262,144
Google DeepMind's 12B dense open-weights model released June 3, 2026, bridging the gap between the edge-friendly E4B and the more advanced 26B MoE. Uniquely features an encoder-free unified architecture that projects raw image patches and audio waveforms directly into the LLM embedding space through lightweight linear layers, eliminating the latency and memory overhead of separate encoders. Supports 256K token context, native text/image/audio inputs, configurable thinking mode, and runs on consumer laptops with 16GB of RAM.
Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.
No evaluation benchmarks for Gemma 4 12B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online