Gemma 4 12B

开源

开放权重

参数

11.95B

上下文长度

262K

模态

Multimodal

架构

Dense

许可证

Apache-2.0

发布日期

3 Jun 2026

训练数据截止日期

系统要求

不同量化方法和上下文大小的显存要求

1024 个令牌

27.02 GB VRAM

消费级

2x RTX 4090

24GB VRAM

数据中心

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

262144 个令牌

134.83 GB VRAM

消费级

7x RTX 4090

24GB VRAM

数据中心

2x NVIDIA A100

80GB VRAM

Apple Silicon

2x Apple M3 Max

128GB VRAM

架构图

评估基准

没有可用的 Gemma 4 12B 评估基准。

排名

编程排名

关于 Gemma 4 12B

Google DeepMind 于 2026 年 6 月 3 日发布的 12B 稠密开源权重模型，填补了边缘侧友好的 E4B 与更先进的 26B MoE 之间的空白。该模型采用了独特的无编码器统一架构，通过轻量级线性层将原始图像块和音频波形直接投影到 LLM 嵌入空间，从而消除了独立编码器带来的延迟和内存开销。它支持 256K token 上下文、原生文本/图像/音频输入以及可配置的思考模式，并可在配备 16GB 内存的消费级笔记本电脑上运行。

技术规格

注意力

注意力结构

Multi-Head Attention

注意力头

键值头

注意力头维度

256

位置嵌入

Absolute Position Embedding

RoPE Theta

10,000

滑动窗口注意力

Yes

滑动窗口大小

1,024

滑动窗口比例

83.3%

线性注意力

线性注意力比例

归一化

RMS Normalization

激活函数

GELU

维度

隐藏维度大小

3,840

层数

FFN 中间层大小（稠密层）

15,360

多 Token 预测头数

分词器

词汇量大小

262,144

资源

官方文档下载权重

关于 Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.