Gemma 2 9B: Specifications and GPU VRAM Requirements

Gemma 2 9B

开源

开放权重

参数

上下文长度

8.192K

模态

Text

架构

Dense

许可证

Gemma License

发布日期

27 Jun 2024

训练数据截止日期

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

2304

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Gemma 2 9B

Gemma 2 9B is a decoder-only, text-to-text large language model developed by Google, forming part of the Gemma family of models. It is engineered to deliver efficient and high-performance language generation, primarily for English-language applications. This variant is available in both base (pre-trained) and instruction-tuned versions, making it adaptable for various natural language processing tasks. The model is designed to be accessible, enabling deployment in environments with limited computational resources, such as personal computers and local cloud infrastructure.

The architectural design of Gemma 2 9B incorporates several technical enhancements for improved performance and inference efficiency. It utilizes Rotary Position Embedding (RoPE) for effective positional encoding. A key innovation is the adoption of Grouped-Query Attention (GQA), which enhances processing efficiency. Furthermore, the model employs an interleaved attention mechanism, alternating between a sliding window attention with a 4096-token window and full global attention spanning 8192 tokens across layers, optimizing context understanding while managing computational demands. For training stability, Gemma 2 9B integrates RMSNorm for both pre-normalization and post-normalization within its layers and applies logit soft-capping. The 9B model specifically benefits from knowledge distillation during its pre-training phase, leveraging insights from larger models. The training corpus for the 9B model consisted of 8 trillion tokens, primarily from web documents, code, and mathematical content.

Gemma 2 9B is suitable for a diverse set of applications, including but not limited to content creation such as poetry, copywriting, and code generation. Its instruction-tuned variants are particularly effective for conversational agents and chatbots, supporting tasks like question answering and summarization. The model's design focuses on enabling efficient inference, allowing its use on a range of hardware, from consumer-grade GPUs to optimized cloud setups. Its open weights and permissive licensing aim to foster broad adoption and innovation within the research and developer communities.

关于 Gemma 2

Gemma 2 is Google's family of open large language models, offering 2B, 9B, and 27B parameter sizes. Built upon the Gemma architecture, it incorporates innovations such as interleaved local and global attention, logit soft-capping for training stability, and Grouped Query Attention for inference efficiency. The smaller models leverage knowledge distillation.

其他 Gemma 2 模型

评估基准

排名适用于本地LLM。

排名

#46

基准	分数	排名
General Knowledge MMLU	0.71	7
StackEval ProLLM Stack Eval	0.72	13
QA Assistant ProLLM QA Assistant	0.82	15
Summarization ProLLM Summarization	0.58	19

排名

#46

编程排名

#37

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Gemma 2 9B

技术规格

系统要求

Gemma 2 9B

关于 Gemma 2

其他 Gemma 2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源