Sahabat-AI-Llama3-8B-Instruct: Specifications and GPU VRAM Requirements

Sahabat-AI-Llama3-8B-Instruct

Closed Source

Open Weights

Parameters

Context Length

8.192K

Modality

Text

Architecture

Dense

License

Llama-3.1-Community

Release Date

14 Nov 2024

Knowledge Cutoff

Mar 2023

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

Sahabat-AI-Llama3-8B-Instruct

Sahabat-AI-Llama3-8B-Instruct is a specialized large language model developed through a collaboration between GoTo Group and Indosat Ooredoo Hutchison. This model is constructed using a continued pre-training (CPT) approach on the Meta Llama 3 architecture, specifically optimized to reflect the linguistic patterns and cultural context of Indonesia. By incorporating a significant corpus of Indonesian text and regional languages such as Javanese and Sundanese, the model provides localized language processing capabilities that account for regional idioms and social contexts.

The technical framework is a dense, decoder-only Transformer architecture comprising 32 layers and a hidden dimension of 4096. It employs Grouped Query Attention (GQA) with 32 query heads and 8 key-value heads to improve inference efficiency. The model utilizes Rotary Positional Embeddings (RoPE) for sequence modeling and SwiGLU activation functions within its feed-forward layers. Training was facilitated by the NVIDIA NeMo framework, allowing the weights to be refined on a dataset of approximately 50 billion tokens, followed by supervised fine-tuning on hundreds of thousands of instruction-completion pairs.

This instruction-tuned variant is designed for high-quality interactions in both formal and informal Indonesian. It addresses specific cultural sensitivities and linguistic variations that are often missing in general-purpose global models. Primary applications include automated customer support for the Indonesian market, localized content synthesis, and technical assistance within the regional digital ecosystem. The model is compatible with the Transformers library and optimized for deployment on standardized accelerated computing infrastructure.

About Sahabat-AI

Sahabat-AI is an Indonesian language model family co-initiated by GoTo and Indosat Ooredoo Hutchison. Developed with AI Singapore and NVIDIA, it is a collection of models (based on Gemma 2 and Llama 3) specifically optimized for Bahasa Indonesia and regional languages like Javanese and Sundanese.

Other Sahabat-AI Models

Evaluation Benchmarks

No evaluation benchmarks for Sahabat-AI-Llama3-8B-Instruct available.

Rankings

Overall Rank

Coding Rank

Model Transparency

Total Score

67 / 100

Upstream

21.0 / 30

Model

28.0 / 40

Downstream

17.5 / 30

Sahabat-AI-Llama3-8B-Instruct Transparency Report

Total Score

/ 100

Audit Note

Sahabat-AI demonstrates good transparency regarding its architectural foundations and the specific volume of instruction-tuning data used for regional language optimization. While it provides clear hardware specifications for its fine-tuning phase and maintains a consistent identity, it lacks detailed disclosure on the specific sources of its continued pre-training data and comprehensive benchmark reproduction assets. The model is honest about its lack of safety alignment, placing the responsibility for downstream filtering on the user.

Upstream

21.0 / 30

Architectural Provenance

7.5 / 10

The model is explicitly identified as a continued pre-training (CPT) variant of the Meta Llama 3 architecture. Documentation specifies it is a dense, decoder-only Transformer with 32 layers, 4096 hidden dimensions, and Grouped Query Attention (GQA). The training methodology is described as a combination of full parameter fine-tuning, on-policy alignment, and model merging using the NVIDIA NeMo framework. While the base architecture is well-documented, the specific modifications for the CPT phase are described at a high level without a full technical paper detailing the exact layer-wise changes if any.

Dataset Composition

5.0 / 10

The model card provides specific token counts for the CPT phase (approximately 50 billion tokens) and instruction tuning (448,000 Indonesian, 96,000 Javanese, 98,000 Sundanese, and 129,000 English pairs). However, the specific sources of the 50B CPT tokens are not disclosed beyond 'publicly available online data' and 'synthetic instructions'. There is no detailed breakdown of the web-to-code ratio or specific filtering/cleaning methodologies provided, which are critical for high-tier transparency.

Tokenizer Integrity

8.5 / 10

The model utilizes the default Llama 3 tokenizer with a vocabulary size of 128,000 tokens. This tokenizer is publicly accessible and its performance on the target languages (Indonesian, Javanese, Sundanese) is verifiable through the weights. The documentation confirms the context length is 8192 tokens, though some inference engines like vLLM may cap it at 4096. The alignment between the tokenizer and the claimed language support is well-documented.

Model

28.0 / 40

Parameter Density

8.0 / 10

The model is clearly stated to have 8.03 billion total parameters. As a dense architecture, all parameters are active during inference. The architectural breakdown (layers, hidden dims, attention heads) is provided in the technical specifications. There is no ambiguity regarding MoE active vs. total parameters, and the parameter count is consistent across official sources.

Training Compute

6.0 / 10

The documentation provides specific hardware details (8x H100-80GB GPUs) and durations for the fine-tuning (4 hours) and alignment (2 hours) phases. However, the compute resources used for the 50B token continued pre-training phase are not explicitly detailed in terms of total GPU hours or hardware specifications. Carbon footprint and total estimated cost for the entire development cycle are missing.

Benchmark Reproducibility

5.0 / 10

Evaluation results are provided for the SEA HELM (BHASA) benchmark and standard English benchmarks (IFEval, MMLU-Pro). While the benchmarks are named and some methodology is described (zero-shot with native prompts), the exact evaluation code and full prompt sets are not publicly linked in the repository. The documentation notes discrepancies with official leaderboards due to inference engine differences (vLLM vs. Transformers), which adds a layer of complexity to independent verification.

Identity Consistency

9.0 / 10

The model consistently identifies itself as Sahabat-AI and is transparent about its origins as a fine-tuned Llama 3 variant. It does not claim to be a different model (like GPT-4) and provides clear versioning (v1). The model card explicitly lists its intended use cases and limitations, including a lack of safety alignment, which demonstrates high honesty regarding its identity and capabilities.

Downstream

17.5 / 30

License Clarity

7.0 / 10

The model is released under the Llama 3.1 Community License. This license is publicly available and outlines terms for commercial use (up to 700M monthly active users) and redistribution. However, there is a slight discrepancy in documentation where some files refer to the 'Llama 3 Community License' while others mention 'Llama 3.1', which could lead to minor legal ambiguity regarding specific derivative work terms.

Hardware Footprint

6.5 / 10

Basic VRAM requirements are provided through third-party and community documentation (e.g., ~16GB for FP16, ~5GB for 4-bit quantization). The official documentation mentions the use of vLLM and Hugging Face Transformers but lacks a comprehensive official table of VRAM vs. context length scaling or specific quantization accuracy trade-off data provided directly by the developers.

Versioning Drift

4.0 / 10

The model uses a versioning string (v1), but there is no public changelog or detailed version history tracking changes between internal iterations or checkpoints. There is no formal mechanism described for tracking or notifying users of behavioral drift, and previous versions of the weights are not easily accessible in a structured historical archive.

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code