Sarvam-30B

Open Source

Open Weights

Active Parameters

32B

Context Length

128K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

6 Mar 2026

Knowledge Cutoff

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Position Embedding

ROPE

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

FFN Intermediate Size

1,024

Tokenizer

Vocabulary Size

262,144

Mixture of Experts

Total Expert Parameters

2.4B

Number of Experts

128

Active Experts

Architecture Diagram

Sarvam-30B

Sarvam-30B is an advanced Mixture-of-Experts (MoE) model with 32B total parameters and 2.4B active parameters, designed for practical deployment in resource-constrained environments. Released March 6, 2026 under Apache 2.0 license. Uses 19 layers with 128 experts, top-6 routing, grouped KV attention (4 heads), and extremely high rope_theta (8e6) for long-context stability. Delivers state-of-the-art performance across 22 Indian languages with strong reasoning, reliable coding ability, and best-in-class conversational quality. Optimized for multilingual voice calls with tool calling capabilities, throughput, and memory efficiency.

About Sarvam

Sarvam AI's sovereign foundation models built for India's languages, culture, and context. Released in March 2026, these advanced Mixture-of-Experts (MoE) models offer state-of-the-art performance across 22 Indian languages while maintaining competitive results on global benchmarks. Designed with focus on reasoning, coding, multilingual capabilities, and agentic tasks. Open-sourced under Apache 2.0 license, optimized for practical deployment from resource-constrained environments to high-performance applications.

Other Sarvam Models

Sarvam-105B

Evaluation Benchmarks

No evaluation benchmarks for Sarvam-30B available.

Rankings

Overall Rank

Coding Rank

Model Integrity

Total Score

67 / 100

Upstream

21.0 / 30

Model

25.0 / 40

Downstream

21.0 / 30

Sarvam-30B Model Integrity Report

Total Score

/ 100

Audit Note

Sarvam-30B exhibits strong transparency in its architectural design and licensing, providing clear distinctions between total and active parameters for its MoE structure. The custom tokenizer is exceptionally well-documented, offering verifiable metrics for multilingual efficiency. However, significant transparency gaps remain regarding the specific composition of its 16-trillion-token training set and the precise compute resources consumed during its development.

Upstream

21.0 / 30

Architectural Provenance

8.0 / 10

Sarvam-30B is explicitly documented as a Mixture-of-Experts (MoE) model trained from scratch, rather than a fine-tune of an existing base. Technical specifications are detailed, including a 19-layer depth (1 dense, 18 MoE), 128 sparse experts with top-6 routing, and the use of Grouped Query Attention (GQA). The model also utilizes a specific rope_theta value (8e6) for long-context stability. Documentation is available via official blog posts and Hugging Face model cards, though a formal peer-reviewed paper is currently absent.

Dataset Composition

4.0 / 10

While the total token count (16 trillion) and general categories (web, code, math, and 22 Indian languages) are disclosed, specific dataset proportions and source names are largely missing. The company mentions 'internally curated datasets' and 'high-quality data' without providing a detailed breakdown or public access to sample data. Filtering and cleaning methodologies are described in general terms (e.g., 'curated in-house') but lack the granularity required for a high score.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly available on Hugging Face and is a standout feature of the model's transparency. It supports 22 Indian languages across 12 scripts with a documented vocabulary size of 68,096 tokens. Technical metrics like 'fertility rates' (1.4 to 2.1 for Indic scripts) are provided and compared against standard multilingual tokenizers, allowing for verifiable efficiency claims. The tokenizer's alignment with the claimed language support is high and verifiable through the provided configuration files.

Model

25.0 / 40

Parameter Density

7.0 / 10

The model clearly distinguishes between its total parameter count (32B) and active parameters (2.4B per token). The architectural breakdown is provided, specifying the number of experts (128) and the routing strategy (top-6). However, the exact distribution of parameters between attention and feed-forward networks (FFN) is not fully detailed in the public documentation, and there is some minor naming inconsistency in marketing materials (30B vs 32B total).

Training Compute

4.0 / 10

Information regarding compute is partially disclosed through the IndiaAI Mission context. It is known that the training utilized NVIDIA H100 GPUs (specifically 4,096 GPUs for the broader mission, with Sarvam receiving significant allocation) and infrastructure from Yotta. However, specific GPU-hours for the 30B variant, exact training duration, and carbon footprint calculations are not publicly documented.

Benchmark Reproducibility

5.0 / 10

Sarvam provides results for several standard benchmarks (Math500, HumanEval, MMLU Pro) and internal Indic benchmarks (IndiVibe). While they disclose evaluation settings (temperature, top_p, max tokens) in model card footnotes, the evaluation code itself is not fully public, and many results are self-reported without third-party verification. The use of an internally designed benchmark (IndiVibe) judged by another AI (Gemini) introduces significant reproducibility gaps.

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as a Sarvam AI model in conversational tests and maintaining version awareness. There are no documented instances of the model claiming a competitor's identity (e.g., claiming to be GPT-4). It is transparent about its role as a sovereign Indian AI model and its specific multilingual capabilities.

Downstream

21.0 / 30

License Clarity

10.0 / 10

The model is released under the Apache 2.0 license, which is a standard, highly permissive open-source license. This license applies to both the code and the model weights, explicitly allowing for commercial use, modification, and distribution. There are no conflicting proprietary terms or restrictive 'non-commercial' clauses found in the official documentation or Hugging Face repository.

Hardware Footprint

7.0 / 10

VRAM requirements are documented for standard inference, with the model requiring approximately 60-64GB of VRAM for full BF16 precision. Guidance is provided for optimized inference on H100, L40S, and Apple Silicon (using MXFP4). Quantization impact is mentioned (e.g., 4-bit/NVFP4 support), though detailed accuracy-tradeoff curves for various quantization levels (Q4, Q8) are not as comprehensive as those found in some other open-weight projects.

Versioning Drift

4.0 / 10

The model uses basic versioning on Hugging Face, but a formal, detailed changelog or semantic versioning system is not prominently maintained. As a relatively new release (March 2026), there is limited history to evaluate drift management or deprecation notices. Documentation regarding how future updates will be handled or how behavior changes will be communicated is currently minimal.

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

63k

125k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Download Weights