Gemma 2 27B: Specifications and GPU VRAM Requirements

Gemma 2 27B

开源

开放权重

参数

27B

上下文长度

8.192K

模态

Text

架构

Dense

许可证

Gemma License

发布日期

27 Jun 2024

知识截止

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

4096

层数

注意力头

键值头

激活函数

GELU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Gemma 2 27B

Gemma 2 is a family of advanced, open models developed by Google DeepMind, stemming from the same research that informed the Gemini models. This model family aims to provide robust capabilities for a range of text generation tasks, including but not limited to question answering, summarization, and reasoning. The 27B variant is engineered for efficient inference, facilitating deployment across various hardware environments, from high-performance workstations to more constrained consumer devices.

The architecture of Gemma 2 represents a progression in Transformer design, integrating several key innovations. These include the adoption of Grouped-Query Attention (GQA) and a strategic interleaving of local and global attention layers. This architectural refinement contributes to enhanced performance and improved inference efficiency, particularly when processing extended contexts. Furthermore, the model employs Logit soft-capping for training stability and incorporates Rotary Position Embeddings (RoPE) for effective positional encoding. Notably, the smaller 2B and 9B models within the Gemma 2 family were developed using knowledge distillation from a larger teacher model.

The Gemma 2 27B model is designed to achieve a high level of performance within its parameter class, while prioritizing computational efficiency. This efficiency enables cost-effective deployment, as the model supports full precision inference on a single high-performance GPU or TPU. The model's capabilities are applicable to tasks requiring sophisticated natural language understanding and generation, making it suitable for applications in content creation, conversational AI systems, and fundamental natural language processing research.

关于 Gemma 2

Gemma 2 is Google's family of open large language models, offering 2B, 9B, and 27B parameter sizes. Built upon the Gemma architecture, it incorporates innovations such as interleaved local and global attention, logit soft-capping for training stability, and Grouped Query Attention for inference efficiency. The smaller models leverage knowledge distillation.

其他 Gemma 2 模型

评估基准

排名适用于本地LLM。

排名

#45

基准	分数	排名
General Knowledge MMLU	0.75	6
StackEval ProLLM Stack Eval	0.72	13
Summarization ProLLM Summarization	0.59	14
QA Assistant ProLLM QA Assistant	0.8	15
Refactoring Aider Refactoring	0.36	16
Coding Aider Coding	0.36	19

排名

#45

编程排名

#42

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Gemma 2 27B

技术规格

系统要求

Gemma 2 27B

关于 Gemma 2

其他 Gemma 2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源