ApX logoApX logo

Sarvam-105B

Active Parameters

106B

Context Length

128K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

6 Mar 2026

Knowledge Cutoff

-

Technical Specifications

Total Expert Parameters

10.3B

Number of Experts

128

Active Experts

8

Attention Structure

Multi-Layer Attention

Hidden Dimension Size

4096

Number of Layers

32

Attention Heads

-

Key-Value Heads

-

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

Sarvam-105B

Sarvam-105B is an advanced Mixture-of-Experts (MoE) model with 106B total parameters and 10.3B active parameters, designed for superior performance across complex tasks. Released March 6, 2026 under Apache 2.0 license. Uses MLA-style attention stack with decoupled QK head dimensions (q_head_dim=192, v_head_dim=128), large head_dim of 576, and 128 experts with top-8 routing. Features 128K native context (extensible via YaRN scaling with factor 40), and delivers exceptional performance in agentic tasks, mathematics, and coding. Consistently matches or surpasses major closed-source models with state-of-the-art results across 22 Indian languages while maintaining competitive global benchmark performance.

About Sarvam

Sarvam AI's sovereign foundation models built for India's languages, culture, and context. Released in March 2026, these advanced Mixture-of-Experts (MoE) models offer state-of-the-art performance across 22 Indian languages while maintaining competitive results on global benchmarks. Designed with focus on reasoning, coding, multilingual capabilities, and agentic tasks. Open-sourced under Apache 2.0 license, optimized for practical deployment from resource-constrained environments to high-performance applications.


Other Sarvam Models

Evaluation Benchmarks

No evaluation benchmarks for Sarvam-105B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Transparency

Total Score

B

68 / 100

Sarvam-105B Transparency Report

Total Score

68

/ 100

B

Audit Note

Sarvam-105B demonstrates strong transparency in its architectural design and licensing, providing deep technical details on its MoE structure and permissive Apache 2.0 terms. However, it suffers from significant gaps in data provenance and independent benchmark verification, relying heavily on self-reported metrics without disclosing specific data sources or evaluation code. The model's transparency profile is currently characterized by high-quality architectural disclosure paired with opaque upstream data and downstream reproducibility practices.

Upstream

20.5 / 30

Architectural Provenance

8.0 / 10

The model's architecture is extensively documented in official blog posts and Hugging Face model cards. It is a Mixture-of-Experts (MoE) transformer built from scratch using the NVIDIA NeMo framework and Megatron-LM. Specific technical details are provided, including the use of Multi-head Latent Attention (MLA) to compress the KV cache, a 32-layer depth (1 dense + 31 MoE), and a decoupled QK head dimension (q_head_dim=192, v_head_dim=128). The use of YaRN scaling for its 128K context window is also explicitly stated.

Dataset Composition

4.0 / 10

While the provider discloses the total token count (12 trillion) and the general categories of data (web, code, math, and multilingual content across 22 Indian languages), it lacks a detailed percentage breakdown of these sources. There is no public documentation on specific data filtering or cleaning methodologies, and no sample data or specific source names (e.g., specific web crawls or datasets) are provided beyond vague 'curated in-house' claims.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly available via the Hugging Face repository. Technical documentation specifies its optimization for Indic languages, achieving fertility rates of 1.4 to 2.1 compared to the 4-8x seen in standard multilingual tokenizers. The vocabulary size and support for 22 Indian languages are clearly stated and verifiable through the provided configuration files.

Model

26.0 / 40

Parameter Density

9.0 / 10

The model provides exemplary transparency regarding its parameter density. It explicitly distinguishes between its 106B total parameters and its 10.3B active parameters per token. The MoE configuration is detailed, specifying 128 experts with a top-8 routing strategy and one shared expert. The architectural breakdown of the backbone (equivalent to a 10-13B dense transformer) is also publicly disclosed.

Training Compute

4.0 / 10

The provider identifies the hardware used (over 1,000 NVIDIA H100 GPUs at Yotta's Shakti cluster) and the framework (NVIDIA NeMo). However, it fails to disclose the total GPU hours, the duration of the training run, the estimated cost, or the carbon footprint associated with the training process.

Benchmark Reproducibility

4.0 / 10

While Sarvam provides scores for several standard benchmarks (MMLU, Math500, LiveCodeBench v6) and a custom Indic benchmark (IndiVibe), it does not provide the evaluation code or the specific prompts used for these results. Furthermore, the model has not yet appeared on major independent leaderboards like the Hugging Face Open LLM Leaderboard or LMSYS Chatbot Arena, making the self-reported results difficult to verify independently.

Identity Consistency

9.0 / 10

The model consistently identifies itself as Sarvam-105B and is transparent about its versioning and capabilities. It does not exhibit identity confusion or claim to be a model from a different provider. Documentation clearly outlines its intended use cases in agentic tasks and multilingual reasoning.

Downstream

21.0 / 30

License Clarity

10.0 / 10

The model is released under the Apache 2.0 license, which is a standard, permissive open-source license. The terms are clear, allowing for both commercial and non-commercial use, modification, and distribution without conflicting proprietary restrictions.

Hardware Footprint

6.0 / 10

Basic guidance on hardware requirements is available, such as the need for approximately 64GB of VRAM for non-quantized inference. The documentation mentions optimizations for NVIDIA Blackwell and the use of NVFP4 quantization, but detailed VRAM scaling for different context lengths and specific accuracy tradeoffs for various quantization levels (Q4, Q8) are not comprehensively documented for the end-user.

Versioning Drift

5.0 / 10

The model uses versioning (Sarvam-105B) and has a clear release date. However, there is no public changelog or established mechanism for tracking silent updates or performance drift. As a newly released model, a history of versioning and deprecation notices has not yet been established.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs