Gemma 2 27B: Specifications and GPU VRAM Requirements

Gemma 2 27B

Open Source

Open Weights

Parameters

27B

Context Length

8.192K

Modality

Text

Architecture

Dense

License

Gemma License

Release Date

27 Jun 2024

Knowledge Cutoff

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

GELU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Gemma 2 27B

Gemma 2 is a family of advanced, open models developed by Google DeepMind, stemming from the same research that informed the Gemini models. This model family aims to provide robust capabilities for a range of text generation tasks, including but not limited to question answering, summarization, and reasoning. The 27B variant is engineered for efficient inference, facilitating deployment across various hardware environments, from high-performance workstations to more constrained consumer devices.

The architecture of Gemma 2 represents a progression in Transformer design, integrating several key innovations. These include the adoption of Grouped-Query Attention (GQA) and a strategic interleaving of local and global attention layers. This architectural refinement contributes to enhanced performance and improved inference efficiency, particularly when processing extended contexts. Furthermore, the model employs Logit soft-capping for training stability and incorporates Rotary Position Embeddings (RoPE) for effective positional encoding. Notably, the smaller 2B and 9B models within the Gemma 2 family were developed using knowledge distillation from a larger teacher model.

The Gemma 2 27B model is designed to achieve a high level of performance within its parameter class, while prioritizing computational efficiency. This efficiency enables cost-effective deployment, as the model supports full precision inference on a single high-performance GPU or TPU. The model's capabilities are applicable to tasks requiring sophisticated natural language understanding and generation, making it suitable for applications in content creation, conversational AI systems, and fundamental natural language processing research.

About Gemma 2

Gemma 2 is Google's family of open large language models, offering 2B, 9B, and 27B parameter sizes. Built upon the Gemma architecture, it incorporates innovations such as interleaved local and global attention, logit soft-capping for training stability, and Grouped Query Attention for inference efficiency. The smaller models leverage knowledge distillation.

Other Gemma 2 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#52

Benchmark	Score	Rank
General Knowledge MMLU	0.75	⭐ 5
StackEval ProLLM Stack Eval	0.72	13
Refactoring Aider Refactoring	0.36	16
QA Assistant ProLLM QA Assistant	0.8	16
Summarization ProLLM Summarization	0.59	17
Coding Aider Coding	0.36	21

Rankings

Overall Rank

#52

Coding Rank

#45

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code