趋近智
活跃参数
48B
上下文长度
1,048.576K
模态
Text
架构
Mixture of Experts (MoE)
许可证
MIT
发布日期
1 Nov 2025
知识截止
-
专家参数总数
3.0B
专家数量
-
活跃专家
-
注意力结构
Multi-Head Attention
隐藏维度大小
-
层数
-
注意力头
-
键值头
-
激活函数
-
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
Kimi Linear is a sophisticated large language model engineered by Moonshot AI, distinguished by its hybrid linear attention architecture. This model variant, Kimi Linear 48B A3B Instruct, integrates Kimi Delta Attention (KDA) with Multi-Head Latent Attention (MLA) layers. KDA represents an advanced linear attention mechanism, extending the Gated DeltaNet by incorporating a finer-grained, channel-wise gating mechanism. This design allows for independent control over memory decay rates across individual feature dimensions, thereby enhancing the regulation of the finite-state recurrent neural network (RNN) memory.
The Kimi Linear architecture employs a specific 3:1 ratio, interleaving KDA layers with periodic MLA layers. This strategic combination aims to balance computational efficiency with the ability to process global information effectively. The underlying chunkwise algorithm within KDA achieves hardware efficiency through a specialized variant of Diagonal-Plus-Low-Rank (DPLR) transition matrices. This approach significantly reduces computational overhead compared to general DPLR formulations, aligning with the classical delta rule while offering more consistency.
The design of Kimi Linear is particularly suited for applications requiring extended context processing and high decoding throughput. By reducing the key-value (KV) cache requirements by up to 75%, it mitigates a common bottleneck in transformer architectures. This efficiency gain enables the model to handle contexts up to 1 million tokens, achieving up to 6x faster decoding throughput in such scenarios. Kimi Linear functions as a drop-in replacement for traditional full attention architectures, offering performance and efficiency for tasks involving longer input and output sequences, including those found in reinforcement learning.
Moonshot AI's hybrid linear attention architecture with Kimi Delta Attention for efficient long-context processing.
排名适用于本地LLM。
没有可用的 Kimi Linear 48B A3B Instruct 评估基准。