ApX 标志

趋近智

Kimi Linear 48B A3B Instruct

活跃参数

48B

上下文长度

1,048.576K

模态

Text

架构

Mixture of Experts (MoE)

许可证

MIT

发布日期

1 Nov 2025

知识截止

-

技术规格

专家参数总数

3.0B

专家数量

-

活跃专家

-

注意力结构

Multi-Head Attention

隐藏维度大小

-

层数

-

注意力头

-

键值头

-

激活函数

-

归一化

-

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Kimi Linear 48B A3B Instruct

Kimi Linear is a sophisticated large language model engineered by Moonshot AI, distinguished by its hybrid linear attention architecture. This model variant, Kimi Linear 48B A3B Instruct, integrates Kimi Delta Attention (KDA) with Multi-Head Latent Attention (MLA) layers. KDA represents an advanced linear attention mechanism, extending the Gated DeltaNet by incorporating a finer-grained, channel-wise gating mechanism. This design allows for independent control over memory decay rates across individual feature dimensions, thereby enhancing the regulation of the finite-state recurrent neural network (RNN) memory.

The Kimi Linear architecture employs a specific 3:1 ratio, interleaving KDA layers with periodic MLA layers. This strategic combination aims to balance computational efficiency with the ability to process global information effectively. The underlying chunkwise algorithm within KDA achieves hardware efficiency through a specialized variant of Diagonal-Plus-Low-Rank (DPLR) transition matrices. This approach significantly reduces computational overhead compared to general DPLR formulations, aligning with the classical delta rule while offering more consistency.

The design of Kimi Linear is particularly suited for applications requiring extended context processing and high decoding throughput. By reducing the key-value (KV) cache requirements by up to 75%, it mitigates a common bottleneck in transformer architectures. This efficiency gain enables the model to handle contexts up to 1 million tokens, achieving up to 6x faster decoding throughput in such scenarios. Kimi Linear functions as a drop-in replacement for traditional full attention architectures, offering performance and efficiency for tasks involving longer input and output sequences, including those found in reinforcement learning.

关于 Kimi Linear

Moonshot AI's hybrid linear attention architecture with Kimi Delta Attention for efficient long-context processing.


其他 Kimi Linear 模型
  • 没有相关模型

评估基准

排名适用于本地LLM。

没有可用的 Kimi Linear 48B A3B Instruct 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
512k
1024k

所需显存:

推荐 GPU

Kimi Linear 48B A3B Instruct: Specifications and GPU VRAM Requirements