Active Parameters
32B
Context Length
128K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
6 Mar 2026
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
64
Key-Value Heads
4
Position Embedding
ROPE
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
4,096
Number of Layers
19
FFN Intermediate Size
1,024
Tokenizer
Vocabulary Size
262,144
Mixture of Experts
Total Expert Parameters
2.4B
Number of Experts
128
Active Experts
6
Sarvam-30B is an advanced Mixture-of-Experts (MoE) model with 32B total parameters and 2.4B active parameters, designed for practical deployment in resource-constrained environments. Released March 6, 2026 under Apache 2.0 license. Uses 19 layers with 128 experts, top-6 routing, grouped KV attention (4 heads), and extremely high rope_theta (8e6) for long-context stability. Delivers state-of-the-art performance across 22 Indian languages with strong reasoning, reliable coding ability, and best-in-class conversational quality. Optimized for multilingual voice calls with tool calling capabilities, throughput, and memory efficiency.
Sarvam AI's sovereign foundation models built for India's languages, culture, and context. Released in March 2026, these advanced Mixture-of-Experts (MoE) models offer state-of-the-art performance across 22 Indian languages while maintaining competitive results on global benchmarks. Designed with focus on reasoning, coding, multilingual capabilities, and agentic tasks. Open-sourced under Apache 2.0 license, optimized for practical deployment from resource-constrained environments to high-performance applications.
No evaluation benchmarks for Sarvam-30B available.
Overall Rank
-
Coding Rank
-
Total Score
67
/ 100
Sarvam-30B exhibits strong transparency in its architectural design and licensing, providing clear distinctions between total and active parameters for its MoE structure. The custom tokenizer is exceptionally well-documented, offering verifiable metrics for multilingual efficiency. However, significant transparency gaps remain regarding the specific composition of its 16-trillion-token training set and the precise compute resources consumed during its development.
Architectural Provenance
Sarvam-30B is explicitly documented as a Mixture-of-Experts (MoE) model trained from scratch, rather than a fine-tune of an existing base. Technical specifications are detailed, including a 19-layer depth (1 dense, 18 MoE), 128 sparse experts with top-6 routing, and the use of Grouped Query Attention (GQA). The model also utilizes a specific rope_theta value (8e6) for long-context stability. Documentation is available via official blog posts and Hugging Face model cards, though a formal peer-reviewed paper is currently absent.
Dataset Composition
While the total token count (16 trillion) and general categories (web, code, math, and 22 Indian languages) are disclosed, specific dataset proportions and source names are largely missing. The company mentions 'internally curated datasets' and 'high-quality data' without providing a detailed breakdown or public access to sample data. Filtering and cleaning methodologies are described in general terms (e.g., 'curated in-house') but lack the granularity required for a high score.
Tokenizer Integrity
The tokenizer is publicly available on Hugging Face and is a standout feature of the model's transparency. It supports 22 Indian languages across 12 scripts with a documented vocabulary size of 68,096 tokens. Technical metrics like 'fertility rates' (1.4 to 2.1 for Indic scripts) are provided and compared against standard multilingual tokenizers, allowing for verifiable efficiency claims. The tokenizer's alignment with the claimed language support is high and verifiable through the provided configuration files.
Parameter Density
The model clearly distinguishes between its total parameter count (32B) and active parameters (2.4B per token). The architectural breakdown is provided, specifying the number of experts (128) and the routing strategy (top-6). However, the exact distribution of parameters between attention and feed-forward networks (FFN) is not fully detailed in the public documentation, and there is some minor naming inconsistency in marketing materials (30B vs 32B total).
Training Compute
Information regarding compute is partially disclosed through the IndiaAI Mission context. It is known that the training utilized NVIDIA H100 GPUs (specifically 4,096 GPUs for the broader mission, with Sarvam receiving significant allocation) and infrastructure from Yotta. However, specific GPU-hours for the 30B variant, exact training duration, and carbon footprint calculations are not publicly documented.
Benchmark Reproducibility
Sarvam provides results for several standard benchmarks (Math500, HumanEval, MMLU Pro) and internal Indic benchmarks (IndiVibe). While they disclose evaluation settings (temperature, top_p, max tokens) in model card footnotes, the evaluation code itself is not fully public, and many results are self-reported without third-party verification. The use of an internally designed benchmark (IndiVibe) judged by another AI (Gemini) introduces significant reproducibility gaps.
Identity Consistency
The model demonstrates high identity consistency, correctly identifying itself as a Sarvam AI model in conversational tests and maintaining version awareness. There are no documented instances of the model claiming a competitor's identity (e.g., claiming to be GPT-4). It is transparent about its role as a sovereign Indian AI model and its specific multilingual capabilities.
License Clarity
The model is released under the Apache 2.0 license, which is a standard, highly permissive open-source license. This license applies to both the code and the model weights, explicitly allowing for commercial use, modification, and distribution. There are no conflicting proprietary terms or restrictive 'non-commercial' clauses found in the official documentation or Hugging Face repository.
Hardware Footprint
VRAM requirements are documented for standard inference, with the model requiring approximately 60-64GB of VRAM for full BF16 precision. Guidance is provided for optimized inference on H100, L40S, and Apple Silicon (using MXFP4). Quantization impact is mentioned (e.g., 4-bit/NVFP4 support), though detailed accuracy-tradeoff curves for various quantization levels (Q4, Q8) are not as comprehensive as those found in some other open-weight projects.
Versioning Drift
The model uses basic versioning on Hugging Face, but a formal, detailed changelog or semantic versioning system is not prominently maintained. As a relatively new release (March 2026), there is limited history to evaluate drift management or deprecation notices. Documentation regarding how future updates will be handled or how behavior changes will be communicated is currently minimal.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online