Gemma 4 E2B

开源

开放权重

参数

5.1B

上下文长度

128K

模态

Multimodal

架构

Dense

许可证

Apache 2.0

发布日期

2 Apr 2026

训练数据截止日期

系统要求

不同量化方法和上下文大小的显存要求

1024 个令牌

12.25 GB VRAM

消费级

1x RTX 4090

24GB VRAM

数据中心

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

128000 个令牌

17.03 GB VRAM

消费级

1x RTX 4090

24GB VRAM

数据中心

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

架构图

评估基准

没有可用的 Gemma 4 E2B 评估基准。

排名

编程排名

关于 Gemma 4 E2B

Gemma 4 E2B 是一款专为移动和物联网设备设计的超高效模型，拥有 23 亿有效参数（采用逐层嵌入时为 51 亿）。该模型支持文本、图像和音频输入，具备 128K 上下文窗口，可在边缘设备上以接近零延迟和离线运行的方式提供前沿能力。此外，它还内置了推理模式和原生函数调用功能，支持智能体工作流。

技术规格

注意力

注意力结构

Grouped-Query Attention

注意力头

键值头

注意力头维度

256

位置嵌入

ROPE

RoPE Theta

10,000

滑动窗口注意力

Yes

滑动窗口大小

512

滑动窗口比例

83.3%

线性注意力

线性注意力比例

归一化

RMS Normalization

激活函数

GELU

维度

隐藏维度大小

6,144

层数

FFN 中间层大小（稠密层）

6,144

多 Token 预测头数

分词器

词汇量大小

262,144

模型完整性

总分

66 / 100

上游

19.0 / 30

模型

24.0 / 40

下游

23.0 / 30

资源

官方文档下载权重源代码

关于 Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.