ApX logoApX logo

Gemma 3 27B

Parameters

27B

Context Length

128K

Modality

Multimodal

Architecture

Dense

License

Gemma Terms of Use

Release Date

12 Mar 2025

Knowledge Cutoff

Aug 2024

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

64

Key-Value Heads

16

Attention Head Dimension

-

Position Embedding

ROPE

RoPE Theta

-

Sliding Window Attention

-

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

-

Dimensions

Hidden Dimension Size

4,096

Number of Layers

46

FFN Intermediate Size (Dense)

-

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

-

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 4.1k · Context: 128kx 46 layersRMSNormPre-AttentionGrouped-Query Attention64Q / 16KV headsHead dim: 64+RMSNormPre-FFNFeed-Forward NetworkActivation+Final RMSNormOutput Logits

Gemma 3 27B

Gemma 3 is a family of lightweight, state-of-the-art models developed by Google DeepMind, designed with research and technology derived from the Gemini models. The Gemma 3 27B variant is a multimodal model engineered to process both textual and image inputs, generating text-based outputs. This model variant is intended for broad application across various generation tasks, including question answering, summarization, and complex reasoning, and supports over 140 languages. Its design focuses on enabling deployment on a range of hardware, from consumer-grade devices like laptops and workstations to specialized cloud infrastructure.

About Gemma 3

Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.


Other Gemma 3 Models

Evaluation Benchmarks

Rank

#109

BenchmarkScoreRank

0.913

13

0.802

13

0.372

28

Web Development

WebDev Arena

1365

34

0.05

36

Rankings

Overall Rank

#109

Coding Rank

#118

Model Integrity

Total Score

B

67 / 100

Gemma 3 27B Model Integrity Report

Total Score

67

/ 100

B

Audit Note

Gemma 3 27B exhibits strong transparency in its architectural design and hardware requirements, providing detailed documentation on its hybrid attention mechanism and VRAM needs. However, it remains opaque regarding training compute resources and the specific composition of its 14-trillion-token dataset. While the model's identity and tokenizer are well-defined, the use of a custom license and limited benchmark reproducibility data are notable weaknesses.

Upstream

20.5 / 30

Architectural Provenance

7.5 / 10

Gemma 3 27B is explicitly documented as a decoder-only Transformer model utilizing a hybrid attention mechanism. The technical report (arXiv:2503.19786) details a specific 5:1 ratio of local sliding-window attention (1024 token span) to global attention layers to manage KV-cache growth. It incorporates a frozen 400M parameter SigLIP vision encoder for multimodality, with vision embeddings condensed into 256 soft tokens. While the high-level architecture and the use of knowledge distillation from larger Gemini models are well-documented, the specific 'teacher' model identities and the exact 'novel post-training recipe' remain proprietary.

Dataset Composition

4.0 / 10

Google discloses the total token count (14 trillion for the 27B variant) and general categories including web documents, code, mathematics, and image-text pairs. It supports over 140 languages. However, specific percentage breakdowns of the dataset (e.g., % code vs % web) are not provided. The documentation explicitly states that the exact composition is proprietary. While filtering methodologies for CSAM and PII are mentioned, the lack of granular source disclosure or sample data availability limits transparency.

Tokenizer Integrity

9.0 / 10

The model uses the Gemini 2.0 tokenizer, which is a SentencePiece-based tokenizer with a vocabulary size of 262,144 tokens. It is publicly available via Hugging Face and documented to include split digits, preserved whitespace, and byte-level encodings. The tokenizer is specifically designed to improve balance for non-English languages across the 140+ supported languages. The integration of <image_soft_token> for multimodal processing is also clearly defined in the technical documentation.

Model

26.0 / 40

Parameter Density

8.0 / 10

The model is clearly identified as a dense architecture with 27 billion parameters. Unlike MoE models, there is no ambiguity regarding active vs. total parameters. The technical report provides a breakdown of the vision encoder (400M) versus the language backbone. The impact of quantization (int4, fp8) on parameter representation is also documented in the technical report, though a full layer-by-layer parameter distribution (e.g., FFN vs. Attention) is less explicitly tabulated for the 27B variant specifically.

Training Compute

3.5 / 10

The technical report identifies the hardware used (TPUv4p, TPUv5p, and TPUv5e) and mentions the use of the Pathways system for multi-pod training. However, it fails to disclose the total number of TPU hours, the duration of the training run, the estimated energy consumption, or the carbon footprint. This lack of environmental and resource transparency is a significant gap common in large-scale corporate model releases.

Benchmark Reproducibility

5.0 / 10

Google provides results for standard benchmarks (MMLU-Pro, MATH, GPQA) and internal evaluations. While the technical report describes the evaluation approach and mentions decontamination steps, it does not provide the full evaluation code, exact prompts, or few-shot examples required for perfect reproduction. Third-party verification is available via the LMSYS Chatbot Arena, but the discrepancy between official claims and independent 'benchmark illusion' studies suggests a need for more transparent methodology.

Identity Consistency

9.5 / 10

Gemma 3 27B consistently identifies itself as an open-weights model from Google DeepMind. It demonstrates high awareness of its versioning and multimodal capabilities. There are no documented instances of the model claiming to be a competitor's product (e.g., GPT-4) or denying its nature as an AI. It provides clear information about its 128K context window and language support when prompted.

Downstream

20.0 / 30

License Clarity

6.5 / 10

The model is released under the 'Gemma Terms of Use,' which is a custom license rather than a standard OSI-approved open-source license like Apache 2.0. While it permits commercial use and redistribution, it includes restrictive clauses regarding 'Prohibited Use Policies' and allows Google to potentially restrict usage. The distinction between 'open weights' and 'open source' is maintained, but the custom terms create some legal ambiguity for derivative works compared to standard licenses.

Hardware Footprint

8.5 / 10

Hardware requirements are exceptionally well-documented. Google provides VRAM estimates for different precisions: ~54-64GB for BF16/FP16 and ~14-21GB for int4/QAT versions. Documentation explicitly notes the memory scaling for the 128K context window and the overhead of the vision encoder. The availability of Quantization-Aware Training (QAT) checkpoints with documented accuracy-memory tradeoffs provides high transparency for downstream deployment.

Versioning Drift

5.0 / 10

Google uses a versioned naming convention (Gemma 3) and provides a technical report for the release. However, there is no public, granular changelog for minor weight updates or 'silent' alignment tuning. While the release date (March 2025) is clear, the long-term commitment to maintaining a version history or providing access to specific 'checkpoints' over time is not as robust as community-led projects.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs

Gemma 3 27B: Specifications and GPU VRAM Requirements