ApX logoApX logo

Sahabat-AI-Gemma2-9B

Parameters

9.2B

Context Length

8.192K

Modality

Text

Architecture

Dense

License

Gemma-Community

Release Date

14 Nov 2024

Knowledge Cutoff

-

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

3584

Number of Layers

42

Attention Heads

16

Key-Value Heads

8

Activation Function

Geglu

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

Sahabat-AI-Gemma2-9B

Sahabat-AI-Gemma2-9B is a specialized large language model designed to handle the linguistic complexities of the Indonesian archipelago, including regional dialects such as Javanese and Sundanese. Developed through a collaboration between GoTo and Indosat Ooredoo Hutchison, with technical support from AI Singapore and NVIDIA, the model is built upon the Gemma 2 9B architecture. It undergoes a rigorous continued pre-training (CPT) phase using approximately 50 billion tokens of Indonesian-centric data. This localized training enables the model to capture deep cultural context and grammatical nuances that are often lost in general-purpose multilingual models.

The technical architecture follows the dense decoder-only transformer design of Gemma 2, incorporating significant optimizations for inference efficiency and training stability. It utilizes Grouped-Query Attention (GQA) with 16 query heads and 8 key-value heads, effectively reducing memory bandwidth requirements during generation. A hallmark of this architecture is the interleaving of global and local sliding window attention layers, which balances long-range dependency modeling with computational performance. The model employs the GeGLU activation function and implements a hybrid normalization scheme using RMSNorm in both pre-norm and post-norm configurations to maintain signal integrity across its 42 layers.

Positioned for deployment in diverse Indonesian applications, Sahabat-AI-Gemma2-9B is engineered for tasks such as multilingual question answering, sentiment analysis, and translation. It utilizes Rotary Position Embeddings (RoPE) and features logit soft-capping to prevent gradient explosion during training and improve overall generation quality. As an open-weights release under the Gemma Community License, it provides a foundational resource for developers to build localized AI services, ranging from enterprise-grade virtual assistants to educational tools optimized for Indonesia's unique digital landscape.

About Sahabat-AI

Sahabat-AI is an Indonesian language model family co-initiated by GoTo and Indosat Ooredoo Hutchison. Developed with AI Singapore and NVIDIA, it is a collection of models (based on Gemma 2 and Llama 3) specifically optimized for Bahasa Indonesia and regional languages like Javanese and Sundanese.


Other Sahabat-AI Models

Evaluation Benchmarks

No evaluation benchmarks for Sahabat-AI-Gemma2-9B available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
4k
8k

VRAM Required:

Recommended GPUs

Sahabat-AI-Gemma2-9B: Specifications and GPU VRAM Requirements