MaLLaM-3B: Specifications and GPU VRAM Requirements

MaLLaM-3B

Open Source

Open Weights

Parameters

Context Length

4.096K

Modality

Text

Architecture

Dense

License

Apache-2.0

Release Date

15 Jan 2024

Knowledge Cutoff

Jan 2024

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

MaLLaM-3B

MaLLaM-3B (Malaysia Large Language Model) is a foundational 3 billion parameter dense model engineered specifically for the Malaysian linguistic context. Developed from scratch by Malaysia AI and Mesolitica, the model addresses the scarcity of high-quality local language representations by leveraging a curated dataset of 90 billion tokens. This training corpus comprises 349GB of diverse Malaysian digital artifacts, including government documents, local news, literature from the Dewan Bahasa Pustaka, and colloquial social media exchanges. By utilizing a custom-trained Byte Pair Encoding (BPE) tokenizer, the model captures unique Malaysian idioms, slang, and cultural references that are often diluted in English-centric foundational models.

Technically, MaLLaM-3B adopts the Mistral transformer-based decoder-only architecture, which facilitates efficient inference and high performance relative to its parameter count. The model utilizes Grouped-Query Attention (GQA) to optimize the KV cache, thereby reducing memory overhead during sequence generation. It implements the SwiGLU activation function and RMSNorm for stable and accelerated convergence during pre-training. For position encoding, the model employs Rotary Position Embeddings (RoPE), enabling it to maintain precise token relationships within its standard 4096-token context window.

Designed primarily for edge deployment and localized applications, MaLLaM-3B is optimized for environments where low-latency text generation and bilingual proficiency in Bahasa Malaysia and English are required. Its compact architecture makes it suitable for integration into mobile applications, localized chatbots, and on-premise document processing systems. Released under the Apache 2.0 license, the model provides an open-weights foundation for researchers and developers to build downstream tasks such as sentiment analysis, summarization, and instruction-following assistants tailored for the Malaysian demographic.

About MaLLaM

Malaysian Large Language Model (MaLLaM) is an open-source language model family developed to support Bahasa Malaysia and English. The model is trained on Malaysian text data including local news, literature, and digital content. It is designed to process Malaysian linguistic nuances and cultural context, available in multiple parameter sizes for different hardware deployments.

Other MaLLaM Models

MaLLaM-7B

Evaluation Benchmarks

No evaluation benchmarks for MaLLaM-3B available.

Rankings

Overall Rank

Coding Rank

Model Transparency

Total Score

B+

73 / 100

Upstream

23.5 / 30

Model

29.5 / 40

Downstream

20.0 / 30

MaLLaM-3B Transparency Report

Total Score

/ 100

B+

Audit Note

MaLLaM-3B demonstrates strong transparency regarding its architectural origins and the localized nature of its training data. It provides a clear open-source path through its Apache 2.0 license and custom tokenizer documentation. The primary areas for improvement include more granular reporting of training compute metrics and more rigorous, reproducible benchmark disclosures to validate its performance claims.

Upstream

23.5 / 30

Architectural Provenance

8.0 / 10

MaLLaM-3B is explicitly documented as a dense decoder-only transformer model based on the Mistral architecture. Technical details are provided in the official GitHub repository and an arXiv technical report, confirming the use of Grouped-Query Attention (GQA), SwiGLU activation, RMSNorm, and Rotary Position Embeddings (RoPE). The model was trained from scratch rather than being a fine-tuned version of an existing model, which is clearly stated and supported by the training methodology documentation.

Dataset Composition

7.0 / 10

The training corpus is described as a 349GB (90 billion token) dataset specifically curated for the Malaysian context. Documentation identifies five primary categories: Dedup text, Extra dedup (research papers), Filtered StarCoder, Instruction data, and MADLAD-400 MS. Specific sources like Lowyat, Cari, and government documents are named. While the general composition is clear, a precise percentage breakdown of each category within the final 90B tokens is not explicitly tabulated in a granular format.

Tokenizer Integrity

8.5 / 10

The model uses a custom-trained Byte Pair Encoding (BPE) tokenizer with a vocabulary size of 32,000. The tokenizer is publicly available on Hugging Face and was specifically trained on a multilingual corpus including Malay, English, Mandarin, Tamil, Jawi, and Arabic to capture local linguistic nuances. Documentation provides clear instructions on its development and intended language support.

Model

29.5 / 40

Parameter Density

9.0 / 10

The model is a dense architecture with 3 billion parameters. Unlike Mixture-of-Experts (MoE) models where active parameters might be obscured, the dense nature makes the parameter count straightforward. The architectural configuration (Mistral-based) is standard and verifiable through the provided configuration files on Hugging Face.

Training Compute

6.5 / 10

Training was conducted using a distributed cluster of 40 GPUs (NVIDIA A100s) managed via Kubernetes and DeepSpeed Zero3. The use of spot instances and AWS infrastructure is disclosed. However, while the hardware type and cluster size are provided, the exact total GPU-hours and the specific carbon footprint calculations are not detailed in the available technical reports.

Benchmark Reproducibility

5.0 / 10

The technical report mentions competitive performance against ChatGPT-3.5 and Malaysian Mistral on instruction-following tasks. However, the specific evaluation code, exact prompt templates, and comprehensive results across standard benchmarks like MMLU or GSM8K are less detailed compared to major foundational releases. There is a lack of third-party verification for the reported internal benchmarks.

Identity Consistency

9.0 / 10

The model is consistently branded as MaLLaM (Malaysia Large Language Model) across all official documentation, GitHub, and Hugging Face. It distinguishes itself clearly from English-centric models and maintains a coherent identity as a localized foundational model. There are no reports of the model misidentifying itself as a competitor's product.

Downstream

20.0 / 30

License Clarity

9.0 / 10

The model and its weights are released under the Apache 2.0 license, which is a standard, permissive open-source license. This is explicitly stated on the Hugging Face model card and the official GitHub repository, providing clear terms for commercial and non-commercial use.

Hardware Footprint

7.0 / 10

The model is designed for edge deployment, and its 3B parameter size implies a baseline VRAM requirement of approximately 6GB for FP16. While basic VRAM estimates are available through community calculators and the model's compact nature, official documentation could be more explicit regarding the specific accuracy-performance tradeoffs for various quantization levels (Q4, Q8).

Versioning Drift

4.0 / 10

While the model is part of a family (1.1B, 3B, 5B), there is limited evidence of a formal semantic versioning system or a detailed public changelog for weight updates. The repository tracks development, but clear markers for 'v1.0' vs 'v1.1' with associated drift analysis are not prominent.

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Read the Paper Download Weights Source Code