Parameters
27B
Context Length
128K
Modality
Multimodal
Architecture
Dense
License
Gemma Terms of Use
Release Date
12 Mar 2025
Knowledge Cutoff
Aug 2024
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
64
Key-Value Heads
16
Attention Head Dimension
-
Position Embedding
ROPE
RoPE Theta
-
Sliding Window Attention
-
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
-
Dimensions
Hidden Dimension Size
4,096
Number of Layers
46
FFN Intermediate Size (Dense)
-
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
-
Gemma 3 is a family of lightweight, state-of-the-art models developed by Google DeepMind, designed with research and technology derived from the Gemini models. The Gemma 3 27B variant is a multimodal model engineered to process both textual and image inputs, generating text-based outputs. This model variant is intended for broad application across various generation tasks, including question answering, summarization, and complex reasoning, and supports over 140 languages. Its design focuses on enabling deployment on a range of hardware, from consumer-grade devices like laptops and workstations to specialized cloud infrastructure.
Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.
Rank
#109
| Benchmark | Score | Rank |
|---|---|---|
QA Assistant ProLLM QA Assistant | 0.913 | 13 |
Summarization ProLLM Summarization | 0.802 | 13 |
StackUnseen ProLLM Stack Unseen | 0.372 | 28 |
Web Development WebDev Arena | 1365 | 34 |
Coding Aider Coding | 0.05 | 36 |
Overall Rank
#109
Coding Rank
#118
Total Score
67
/ 100
Gemma 3 27B exhibits strong transparency in its architectural design and hardware requirements, providing detailed documentation on its hybrid attention mechanism and VRAM needs. However, it remains opaque regarding training compute resources and the specific composition of its 14-trillion-token dataset. While the model's identity and tokenizer are well-defined, the use of a custom license and limited benchmark reproducibility data are notable weaknesses.
Architectural Provenance
Gemma 3 27B is explicitly documented as a decoder-only Transformer model utilizing a hybrid attention mechanism. The technical report (arXiv:2503.19786) details a specific 5:1 ratio of local sliding-window attention (1024 token span) to global attention layers to manage KV-cache growth. It incorporates a frozen 400M parameter SigLIP vision encoder for multimodality, with vision embeddings condensed into 256 soft tokens. While the high-level architecture and the use of knowledge distillation from larger Gemini models are well-documented, the specific 'teacher' model identities and the exact 'novel post-training recipe' remain proprietary.
Dataset Composition
Google discloses the total token count (14 trillion for the 27B variant) and general categories including web documents, code, mathematics, and image-text pairs. It supports over 140 languages. However, specific percentage breakdowns of the dataset (e.g., % code vs % web) are not provided. The documentation explicitly states that the exact composition is proprietary. While filtering methodologies for CSAM and PII are mentioned, the lack of granular source disclosure or sample data availability limits transparency.
Tokenizer Integrity
The model uses the Gemini 2.0 tokenizer, which is a SentencePiece-based tokenizer with a vocabulary size of 262,144 tokens. It is publicly available via Hugging Face and documented to include split digits, preserved whitespace, and byte-level encodings. The tokenizer is specifically designed to improve balance for non-English languages across the 140+ supported languages. The integration of <image_soft_token> for multimodal processing is also clearly defined in the technical documentation.
Parameter Density
The model is clearly identified as a dense architecture with 27 billion parameters. Unlike MoE models, there is no ambiguity regarding active vs. total parameters. The technical report provides a breakdown of the vision encoder (400M) versus the language backbone. The impact of quantization (int4, fp8) on parameter representation is also documented in the technical report, though a full layer-by-layer parameter distribution (e.g., FFN vs. Attention) is less explicitly tabulated for the 27B variant specifically.
Training Compute
The technical report identifies the hardware used (TPUv4p, TPUv5p, and TPUv5e) and mentions the use of the Pathways system for multi-pod training. However, it fails to disclose the total number of TPU hours, the duration of the training run, the estimated energy consumption, or the carbon footprint. This lack of environmental and resource transparency is a significant gap common in large-scale corporate model releases.
Benchmark Reproducibility
Google provides results for standard benchmarks (MMLU-Pro, MATH, GPQA) and internal evaluations. While the technical report describes the evaluation approach and mentions decontamination steps, it does not provide the full evaluation code, exact prompts, or few-shot examples required for perfect reproduction. Third-party verification is available via the LMSYS Chatbot Arena, but the discrepancy between official claims and independent 'benchmark illusion' studies suggests a need for more transparent methodology.
Identity Consistency
Gemma 3 27B consistently identifies itself as an open-weights model from Google DeepMind. It demonstrates high awareness of its versioning and multimodal capabilities. There are no documented instances of the model claiming to be a competitor's product (e.g., GPT-4) or denying its nature as an AI. It provides clear information about its 128K context window and language support when prompted.
License Clarity
The model is released under the 'Gemma Terms of Use,' which is a custom license rather than a standard OSI-approved open-source license like Apache 2.0. While it permits commercial use and redistribution, it includes restrictive clauses regarding 'Prohibited Use Policies' and allows Google to potentially restrict usage. The distinction between 'open weights' and 'open source' is maintained, but the custom terms create some legal ambiguity for derivative works compared to standard licenses.
Hardware Footprint
Hardware requirements are exceptionally well-documented. Google provides VRAM estimates for different precisions: ~54-64GB for BF16/FP16 and ~14-21GB for int4/QAT versions. Documentation explicitly notes the memory scaling for the 128K context window and the overhead of the vision encoder. The availability of Quantization-Aware Training (QAT) checkpoints with documented accuracy-memory tradeoffs provides high transparency for downstream deployment.
Versioning Drift
Google uses a versioned naming convention (Gemma 3) and provides a technical report for the release. However, there is no public, granular changelog for minor weight updates or 'silent' alignment tuning. While the release date (March 2025) is clear, the long-term commitment to maintaining a version history or providing access to specific 'checkpoints' over time is not as robust as community-led projects.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online