Parameters
9.2B
Context Length
8.192K
Modality
Text
Architecture
Dense
License
Gemma-Community
Release Date
14 Nov 2024
Knowledge Cutoff
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
3584
Number of Layers
42
Attention Heads
16
Key-Value Heads
8
Activation Function
Geglu
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
Sahabat-AI-Gemma2-9B is a specialized large language model designed to handle the linguistic complexities of the Indonesian archipelago, including regional dialects such as Javanese and Sundanese. Developed through a collaboration between GoTo and Indosat Ooredoo Hutchison, with technical support from AI Singapore and NVIDIA, the model is built upon the Gemma 2 9B architecture. It undergoes a rigorous continued pre-training (CPT) phase using approximately 50 billion tokens of Indonesian-centric data. This localized training enables the model to capture deep cultural context and grammatical nuances that are often lost in general-purpose multilingual models.
The technical architecture follows the dense decoder-only transformer design of Gemma 2, incorporating significant optimizations for inference efficiency and training stability. It utilizes Grouped-Query Attention (GQA) with 16 query heads and 8 key-value heads, effectively reducing memory bandwidth requirements during generation. A hallmark of this architecture is the interleaving of global and local sliding window attention layers, which balances long-range dependency modeling with computational performance. The model employs the GeGLU activation function and implements a hybrid normalization scheme using RMSNorm in both pre-norm and post-norm configurations to maintain signal integrity across its 42 layers.
Positioned for deployment in diverse Indonesian applications, Sahabat-AI-Gemma2-9B is engineered for tasks such as multilingual question answering, sentiment analysis, and translation. It utilizes Rotary Position Embeddings (RoPE) and features logit soft-capping to prevent gradient explosion during training and improve overall generation quality. As an open-weights release under the Gemma Community License, it provides a foundational resource for developers to build localized AI services, ranging from enterprise-grade virtual assistants to educational tools optimized for Indonesia's unique digital landscape.
Sahabat-AI is an Indonesian language model family co-initiated by GoTo and Indosat Ooredoo Hutchison. Developed with AI Singapore and NVIDIA, it is a collection of models (based on Gemma 2 and Llama 3) specifically optimized for Bahasa Indonesia and regional languages like Javanese and Sundanese.
No evaluation benchmarks for Sahabat-AI-Gemma2-9B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens