MaLLaM-3B：规格和 GPU 显存要求

MaLLaM-3B

开源

开放权重

参数

上下文长度

4.096K

模态

Text

架构

Dense

许可证

Apache-2.0

发布日期

15 Jan 2024

训练数据截止日期

Jan 2024

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

MaLLaM-3B

MaLLaM-3B (Malaysia Large Language Model) is a foundational 3 billion parameter dense model engineered specifically for the Malaysian linguistic context. Developed from scratch by Malaysia AI and Mesolitica, the model addresses the scarcity of high-quality local language representations by leveraging a curated dataset of 90 billion tokens. This training corpus comprises 349GB of diverse Malaysian digital artifacts, including government documents, local news, literature from the Dewan Bahasa Pustaka, and colloquial social media exchanges. By utilizing a custom-trained Byte Pair Encoding (BPE) tokenizer, the model captures unique Malaysian idioms, slang, and cultural references that are often diluted in English-centric foundational models.

Technically, MaLLaM-3B adopts the Mistral transformer-based decoder-only architecture, which facilitates efficient inference and high performance relative to its parameter count. The model utilizes Grouped-Query Attention (GQA) to optimize the KV cache, thereby reducing memory overhead during sequence generation. It implements the SwiGLU activation function and RMSNorm for stable and accelerated convergence during pre-training. For position encoding, the model employs Rotary Position Embeddings (RoPE), enabling it to maintain precise token relationships within its standard 4096-token context window.

Designed primarily for edge deployment and localized applications, MaLLaM-3B is optimized for environments where low-latency text generation and bilingual proficiency in Bahasa Malaysia and English are required. Its compact architecture makes it suitable for integration into mobile applications, localized chatbots, and on-premise document processing systems. Released under the Apache 2.0 license, the model provides an open-weights foundation for researchers and developers to build downstream tasks such as sentiment analysis, summarization, and instruction-following assistants tailored for the Malaysian demographic.

关于 MaLLaM

Malaysian Large Language Model (MaLLaM) is an open-source language model family developed to support Bahasa Malaysia and English. The model is trained on Malaysian text data including local news, literature, and digital content. It is designed to process Malaysian linguistic nuances and cultural context, available in multiple parameter sizes for different hardware deployments.

其他 MaLLaM 模型

MaLLaM-7B

评估基准

没有可用的 MaLLaM-3B 评估基准。

排名

编程排名

模型透明度

总分

B+

73 / 100

上游

23.5 / 30

模型

29.5 / 40

下游

20.0 / 30

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

所需显存:

资源

官方文档阅读论文下载权重源代码

MaLLaM-3B

技术规格

MaLLaM-3B

关于 MaLLaM

其他 MaLLaM 模型

评估基准

排名

模型透明度

GPU 要求

所需显存:

推荐 GPU

资源