Gemma 3 4B: Specifications and GPU VRAM Requirements

Gemma 3 4B

闭源

开放权重

参数

上下文长度

131.072K

模态

Multimodal

架构

Dense

许可证

Gemma License

发布日期

12 Mar 2025

训练数据截止日期

Aug 2024

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

2048

层数

注意力头

键值头

激活函数

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Gemma 3 4B

Gemma 3 4B is a foundational vision-language model developed by Google, designed to process both text and image inputs while generating textual outputs. It is part of the Gemma 3 family of lightweight, state-of-the-art models built upon the same research and technology that powers Google's Gemini models. The 4 billion parameter variant is optimized for efficient performance across diverse hardware environments, ranging from cloud-scale deployments to on-device execution on workstations, laptops, and mobile devices.

Architecturally, Gemma 3 4B employs a decoder-only transformer design. Key innovations include an optimized attention mechanism featuring a 5:1 interleaving ratio of local sliding window self-attention layers with global self-attention layers, coupled with a reduced window size for local attention. This architectural modification aims to decrease KV-cache memory overhead, enabling efficient processing of extended context lengths without degrading perplexity. The model utilizes a custom SigLIP vision encoder, which transforms 896x896 pixel square images into tokens for the language model, with a "Pan&Scan" algorithm employed to handle images of varying aspect ratios or higher resolutions.

Gemma 3 4B is engineered for a wide array of generative AI tasks, including question answering, summarization, and complex reasoning. Its multimodal capabilities allow for comprehensive understanding and analysis of visual data, such as object identification or text extraction from images. The model supports a context window of 128,000 tokens and offers broad multilingual capabilities, handling over 140 languages. Additionally, it integrates function calling, enabling the creation of intelligent agents that can interact with external tools and application programming interfaces.

关于 Gemma 3

Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.

其他 Gemma 3 模型

评估基准

排名适用于本地LLM。

排名

#54

基准	分数	排名
Professional Knowledge MMLU Pro	0.44	5
Graduate-Level QA GPQA	0.31	26
Mathematics LiveBench Mathematics	0.31	30
Coding LiveBench Coding	0.16	31
Reasoning LiveBench Reasoning	0.20	31
Data Analysis LiveBench Data Analysis	0.39	31
General Knowledge MMLU	0.31	37

排名

#54

编程排名

#44

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Gemma 3 4B

技术规格

系统要求

Gemma 3 4B

关于 Gemma 3

其他 Gemma 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源