趋近智
活跃参数
106B
上下文长度
128K
模态
Text
架构
Mixture of Experts (MoE)
许可证
Apache 2.0
发布日期
6 Mar 2026
训练数据截止日期
-
专家参数总数
10.3B
专家数量
128
活跃专家
8
注意力结构
Multi-Layer Attention
隐藏维度大小
4096
层数
32
注意力头
-
键值头
-
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
ROPE
Sarvam-105B is an advanced Mixture-of-Experts (MoE) model with 106B total parameters and 10.3B active parameters, designed for superior performance across complex tasks. Released March 6, 2026 under Apache 2.0 license. Uses MLA-style attention stack with decoupled QK head dimensions (q_head_dim=192, v_head_dim=128), large head_dim of 576, and 128 experts with top-8 routing. Features 128K native context (extensible via YaRN scaling with factor 40), and delivers exceptional performance in agentic tasks, mathematics, and coding. Consistently matches or surpasses major closed-source models with state-of-the-art results across 22 Indian languages while maintaining competitive global benchmark performance.
Sarvam AI's sovereign foundation models built for India's languages, culture, and context. Released in March 2026, these advanced Mixture-of-Experts (MoE) models offer state-of-the-art performance across 22 Indian languages while maintaining competitive results on global benchmarks. Designed with focus on reasoning, coding, multilingual capabilities, and agentic tasks. Open-sourced under Apache 2.0 license, optimized for practical deployment from resource-constrained environments to high-performance applications.
没有可用的 Sarvam-105B 评估基准。