MaLLaM-7B: Specifications and GPU VRAM Requirements

MaLLaM-7B

Open Source

Open Weights

Parameters

Context Length

4.096K

Modality

Text

Architecture

Dense

License

Apache-2.0

Release Date

15 Jan 2024

Knowledge Cutoff

Jan 2024

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

MaLLaM-7B

MaLLaM-7B (Malaysian Large Language Model) is a dense decoder-only transformer designed to process and generate text with high fidelity to the linguistic patterns of the Malaysian region. Developed by Mesolitica, the model is pre-trained from scratch on a specialized dataset comprising approximately 90 billion tokens, derived from a diverse range of Malaysian sources including government documents, local news, and social media forums. This extensive exposure to localized content allows the model to handle regional dialects, slang, and cultural nuances that are frequently underrepresented in more generalized global models.

The architecture of MaLLaM-7B follows the Mistral-7B design pattern, utilizing a standard transformer structure optimized for efficient inference and training. It employs a Byte Pair Encoding (BPE) tokenizer with a 32,000-vocabulary size, specifically trained on Malaysian multilingual data including Malay, English, Mandarin, Tamil, and Jawi scripts. The model integrates modern architectural refinements such as Rotary Positional Embeddings (RoPE) and Grouped Query Attention (GQA), which facilitate improved handling of sequence dependencies and computational efficiency during the generation process.

Technically, MaLLaM-7B is configured with a hidden dimension of 4096 and consists of 32 transformer layers. It is trained with a context window of 4096 tokens, making it suitable for tasks such as multi-turn dialogue, document summarization, and localized text completion. The model is released under the Apache 2.0 license, promoting transparency and accessibility for researchers and developers working within the Southeast Asian NLP ecosystem. It serves as a foundational component for building applications that require deep alignment with Malaysian linguistic identity and idiomatic expressions.

About MaLLaM

Malaysian Large Language Model (MaLLaM) is an open-source language model family developed to support Bahasa Malaysia and English. The model is trained on Malaysian text data including local news, literature, and digital content. It is designed to process Malaysian linguistic nuances and cultural context, available in multiple parameter sizes for different hardware deployments.

Other MaLLaM Models

MaLLaM-3B

Evaluation Benchmarks

No evaluation benchmarks for MaLLaM-7B available.

Rankings

Overall Rank

Coding Rank

Model Transparency

Total Score

B+

73 / 100

Upstream

22.5 / 30

Model

28.0 / 40

Downstream

22.0 / 30

MaLLaM-7B Transparency Report

Total Score

/ 100

B+

Audit Note

MaLLaM-7B demonstrates strong transparency regarding its architectural origins and localized dataset composition, providing verifiable evidence of its training sources and tokenizer design. While it excels in licensing and identity consistency, it lacks detailed compute metrics and a formal versioning changelog. The model's commitment to open weights and public documentation of its Malaysian-centric training data sets a high standard for regional LLM development.

Upstream

22.5 / 30

Architectural Provenance

7.5 / 10

The model is explicitly identified as a dense decoder-only transformer following the Mistral-7B design pattern. Documentation confirms the use of standard refinements including Rotary Positional Embeddings (RoPE) and Grouped Query Attention (GQA). Technical specifications are detailed, including 32 layers, a hidden dimension of 4096, and a 4096-token context window. While the model is described as 'trained from scratch,' the reliance on the Mistral architecture is well-documented in the official repository and technical reports.

Dataset Composition

6.5 / 10

The training data is disclosed as a 349GB JSONL dataset (approx. 90 billion tokens) derived from 197 specialized Malaysian sources. Specific categories are named, including government documents (e.g., parliament transcripts), local news (Bernama, Star), and social media (Lowyat, Cari). The developers provide a list of scraped websites and reproduction notebooks for data collection. However, a precise percentage breakdown of the final training mixture (e.g., web vs. code vs. instructions) is not explicitly quantified in a single comprehensive table.

Tokenizer Integrity

8.5 / 10

The model uses a custom Byte Pair Encoding (BPE) tokenizer with a 32,000-vocabulary size, specifically trained on a multilingual Malaysian corpus. It supports Malay, English, Mandarin, Tamil, and Jawi scripts. The tokenizer is publicly available on Hugging Face, and its alignment with the claimed language support is verifiable through the provided configuration files and training methodology documentation.

Model

28.0 / 40

Parameter Density

8.0 / 10

The model is clearly defined as a 7B parameter dense architecture. Unlike Mixture-of-Experts (MoE) models, all parameters are active during inference. The architectural configuration (32 layers, 4096 hidden size) is standard for this class and consistently reported across the official Hugging Face model card and GitHub repository.

Training Compute

5.0 / 10

Hardware specifications are partially disclosed; the developers mention using a Ray cluster with 5 nodes of 4x A100 80GB GPUs. While the hardware type is clear, the total GPU hours, training duration, and carbon footprint are not explicitly stated in the primary documentation. Some cost-efficiency claims (87% savings using AWS Trainium) are mentioned in press releases but lack the raw compute metrics required for a higher score.

Benchmark Reproducibility

6.0 / 10

The model has been evaluated on localized benchmarks like MalayMMLU and compared against ChatGPT 3.5. While some evaluation scripts are available in the repository, the exact prompts and few-shot examples used for the official reported scores are not fully documented in a centralized, reproducible format. Third-party verification is available through the MalayMMLU leaderboard, but the internal evaluation methodology remains partially opaque.

Identity Consistency

9.0 / 10

MaLLaM-7B consistently identifies itself as a Malaysian Large Language Model developed by Mesolitica. It does not exhibit identity confusion with other major models like GPT-4 or Llama. The versioning (e.g., v1.1, v2.5) is clearly tracked in the Hugging Face collections, and the model's limitations regarding its specialized regional focus are transparently discussed.

Downstream

22.0 / 30

License Clarity

9.5 / 10

The model is released under the Apache 2.0 license, which is a standard, permissive open-source license. The terms for commercial use, modification, and redistribution are clear and consistent across the GitHub repository and Hugging Face model card. There are no conflicting proprietary restrictions mentioned in the official documentation.

Hardware Footprint

7.0 / 10

VRAM requirements are documented for standard 16-bit (approx. 14-16GB) and 4-bit (approx. 4-5GB) inference. The model card provides sample code for loading with BitsAndBytes 4-bit quantization. While it lacks a detailed context-scaling memory table, the baseline requirements for consumer and enterprise hardware are well-defined.

Versioning Drift

5.5 / 10

The project uses version numbers (e.g., MaLLaM-7B, MaLLaM-v2), but it lacks a formal, detailed changelog or semantic versioning history that tracks specific weight updates or performance drift over time. Users must rely on separate Hugging Face model entries to track progress, which makes monitoring silent updates difficult.

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Read the Paper Download Weights Source Code