Active Parameters
671B
Context Length
128K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Proprietary
Release Date
12 Aug 2025
Knowledge Cutoff
Dec 2024
Attention
Attention Structure
Multi-Head Attention
Attention Heads
128
Key-Value Heads
1
Attention Head Dimension
-
Position Embedding
ROPE
RoPE Theta
-
Sliding Window Attention
-
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
2,048
Number of Layers
61
FFN Intermediate Size (Dense)
-
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
-
Mixture of Experts
Total Expert Parameters
37.0B
Number of Experts
257
Active Experts
9
Shared Experts
-
FFN Intermediate Size (per Expert)
-
Dense Layers Before MoE
-
ILMU is a Malaysian sovereign language model. Based on a fine-tuned DeepSeek-V3 Reasoning Architecture, it is optimized for localized reasoning and the linguistic profile of Bahasa Malaysia. The model uses reinforcement learning to align logic with local professional standards, specifically ensuring an inclusive approach to Malaysia's diverse religions and traditions, with strict respect for cultures, institutions, and royalty (3R). It is calibrated to recognize Malaysian sensitivities and profanity, to maintain proper social etiquette.
Intelek Luhur Malaysia Untukmu (ILMU) is a language model developed by YTL AI Labs. Trained on YTL AI Cloud infrastructure, the model is designed for Malaysian social norms and linguistic nuances including Bahasa Melayu and Chinese.
No evaluation benchmarks for ILMU 1.0 available.
Overall Rank
-
Coding Rank
-
Total Score
28
/ 100
ILMU 1.0 presents a highly opaque transparency profile characterized by significant contradictions between its marketing narrative and its technical foundations. While it provides some basic architectural metrics, the lack of public documentation regarding training data, compute resources, and evaluation methodology makes its claims of sovereignty and performance difficult to verify. The proprietary nature of the model and its closed-access delivery further limit independent auditability and technical accountability.
Architectural Provenance
Although the model is presented as a sovereign foundation model developed within Malaysia, specific details regarding its pre-training methodology and core framework have not been disclosed in any official whitepaper. Public technical documentation remains sparse, with no peer-reviewed papers currently available to provide transparency into the model’s weight initialization or structural modifications. Consequently, the architectural provenance is primarily identified through internal audits, which point to the model being built upon a DeepSeek-based architecture.
Dataset Composition
The provider mentions general categories of data, including 'publicly available web content,' 'licensed third-party corpora,' and 'Malaysia-centric sources' (educational, cultural, and government materials). However, there is no public breakdown of the dataset composition (e.g., percentages of code, web, or specific languages), no disclosure of the specific licensed sources, and no detailed documentation on the filtering or cleaning methodologies used. The claim of being '100% made in Malaysia' is not supported by a verifiable list of data sources or a transparent data provenance report.
Tokenizer Integrity
The tokenizer is not publicly available for independent inspection or testing. While the model is claimed to have superior support for Bahasa Melayu, there is no documentation regarding the vocabulary size, tokenization approach (e.g., BPE, SentencePiece), or how the tokenizer was aligned with the claimed multilingual training data. The lack of an open-source tokenizer or technical specifications prevents verification of its efficiency for the Malaysian linguistic profile. Inspection of tokenizer patterns points towards those of Deepseek, with highly efficient tokenization for English and Chinese.
Parameter Density
The model's total parameter count is estimated as 671 billion, following the DeepSeek-V3 architecture. The density is primary inferred from the model problem solving capabilities rather than the comprehensive official technical report, and there is no detailed architectural breakdown of the parameter distribution across layers.
Training Compute
Information regarding training compute is extremely vague. The provider mentions that the model was trained on 'YTL AI Cloud infrastructure' in partnership with NVIDIA and notes a total investment of RM20 billion in the ecosystem. However, specific metrics such as total GPU/TPU hours, the exact hardware configuration used for the training run, the duration of the training, and the associated carbon footprint are entirely absent from public documentation.
Benchmark Reproducibility
The model reports high scores on benchmarks like MMLU, CMMLU, and a specialized 'MalayMMLU.' While the MalayMMLU benchmark itself is hosted on GitHub, the specific evaluation code, exact prompts, and few-shot configurations used by YTL AI Labs to achieve their reported scores are not public. There is no provided path for third-party researchers to reproduce the claimed results, and the reliance on 'internal benchmarks' for cultural alignment further limits transparency.
Identity Consistency
The model maintains a clear brand identity as 'ILMU' (Intelek Luhur Malaysia Untukmu) and is marketed as a sovereign Malaysian AI. However, there is a lack of transparency regarding version tracking and identity awareness within the model's outputs. As a closed-source model hosted on a proprietary platform, it utilizes significant guardrails specifically designed to reinforce and protect the model’s brand identity and technical boundaries.
License Clarity
The model is explicitly labeled as 'Proprietary' with 'Closed Weights' and 'Closed Source.' There is no public license agreement available for review, and access is restricted to a closed API and a consumer chatbot. The terms of use for derivative works or commercial applications are not clearly defined beyond the 'ILMU AI Accelerator Programme,' which requires a specific application process rather than a transparent licensing framework.
Hardware Footprint
There is no official guidance on the hardware requirements for running the model, as it is not available for local deployment. While third-party estimates suggest that a 671B MoE model would require massive VRAM (upwards of 600GB+ for FP16), the provider offers no documentation on quantization tradeoffs, memory scaling for different context lengths, or the specific infrastructure needed for enterprise API integration.
Versioning Drift
The model uses a versioning scheme (e.g., 0.1, 1.0), but there is no public changelog or documentation of updates. Because the model is served via a closed API, users have no way to track behavioral drift or performance changes. There is no stated policy for deprecating older versions or providing notice for silent updates to the underlying weights or safety filters.
APX AI
Online