ApX 标志ApX 标志

趋近智

Kimi Linear 48B A3B Instruct

活跃参数

48B

上下文长度

1,048.576K

模态

Text

架构

Mixture of Experts (MoE)

许可证

MIT

发布日期

1 Nov 2025

训练数据截止日期

Oct 2024

技术规格

专家参数总数

3.0B

专家数量

128

活跃专家

8

注意力结构

Multi-Head Attention

隐藏维度大小

4096

层数

36

注意力头

32

键值头

1

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

Kimi Linear 48B A3B Instruct

Kimi Linear 48B A3B Instruct is a large-scale language model that implements a hybrid linear attention architecture, designed to overcome the memory and computational constraints of traditional Transformer models. The core innovation lies in the integration of Kimi Delta Attention (KDA) with Multi-Head Latent Attention (MLA) in a specific 3:1 interleaving ratio. KDA builds upon the Gated DeltaNet framework by introducing a channel-wise gating mechanism that allows for independent control over memory decay across individual feature dimensions. This configuration transforms the attention mechanism into a finite-state recurrent neural network (RNN), providing a constant-state memory footprint regardless of sequence length.

The model utilizes a Mixture-of-Experts (MoE) architecture to manage its 48 billion total parameters, with approximately 3 billion parameters active during any single forward pass. This sparsity, combined with the hybrid attention structure, facilitates high-throughput inference and efficient long-context processing. The KDA layers employ a specialized chunkwise algorithm based on Diagonal-Plus-Low-Rank (DPLR) transition matrices, which optimizes hardware utilization on modern accelerators. By offloading global dependency modeling to periodic MLA layers while maintaining local and recurrent state through KDA, the model achieves a balance between expressive power and linear scaling.

From an implementation perspective, Kimi Linear 48B A3B Instruct serves as a high-efficiency alternative for tasks requiring extensive context windows, supporting up to 1 million tokens. The architecture significantly reduces Key-Value (KV) cache requirements by approximately 75% compared to standard multi-head attention models. This reduction in memory overhead allows for substantially higher decoding speeds in long-sequence applications, such as document analysis and complex reasoning, while maintaining compatibility with standard training and fine-tuning workflows via its open-source MIT-licensed implementation.

关于 Kimi Linear

Moonshot AI's hybrid linear attention architecture with Kimi Delta Attention for efficient long-context processing.


其他 Kimi Linear 模型
  • 没有相关模型

评估基准

没有可用的 Kimi Linear 48B A3B Instruct 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
512k
1024k

所需显存:

推荐 GPU