趋近智
参数
9.2B
上下文长度
8.192K
模态
Text
架构
Dense
许可证
Gemma-Community
发布日期
14 Nov 2024
训练数据截止日期
-
注意力结构
Multi-Head Attention
隐藏维度大小
3584
层数
42
注意力头
16
键值头
8
激活函数
Gated GELU
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
Sahabat-AI-Gemma2-9B is a specialized large language model designed to handle the linguistic complexities of the Indonesian archipelago, including regional dialects such as Javanese and Sundanese. Developed through a collaboration between GoTo and Indosat Ooredoo Hutchison, with technical support from AI Singapore and NVIDIA, the model is built upon the Gemma 2 9B architecture. It undergoes a rigorous continued pre-training (CPT) phase using approximately 50 billion tokens of Indonesian-centric data. This localized training enables the model to capture deep cultural context and grammatical nuances that are often lost in general-purpose multilingual models.
The technical architecture follows the dense decoder-only transformer design of Gemma 2, incorporating significant optimizations for inference efficiency and training stability. It utilizes Grouped-Query Attention (GQA) with 16 query heads and 8 key-value heads, effectively reducing memory bandwidth requirements during generation. A hallmark of this architecture is the interleaving of global and local sliding window attention layers, which balances long-range dependency modeling with computational performance. The model employs the GeGLU activation function and implements a hybrid normalization scheme using RMSNorm in both pre-norm and post-norm configurations to maintain signal integrity across its 42 layers.
Positioned for deployment in diverse Indonesian applications, Sahabat-AI-Gemma2-9B is engineered for tasks such as multilingual question answering, sentiment analysis, and translation. It utilizes Rotary Position Embeddings (RoPE) and features logit soft-capping to prevent gradient explosion during training and improve overall generation quality. As an open-weights release under the Gemma Community License, it provides a foundational resource for developers to build localized AI services, ranging from enterprise-grade virtual assistants to educational tools optimized for Indonesia's unique digital landscape.
Sahabat-AI is an Indonesian language model family co-initiated by GoTo and Indosat Ooredoo Hutchison. Developed with AI Singapore and NVIDIA, it is a collection of models (based on Gemma 2 and Llama 3) specifically optimized for Bahasa Indonesia and regional languages like Javanese and Sundanese.
没有可用的 Sahabat-AI-Gemma2-9B 评估基准。