Kimi Linear 48B A3B Instruct: Specifications and GPU VRAM Requirements

Kimi Linear 48B A3B Instruct

开源

开放权重

活跃参数

48B

上下文长度

1,048.576K

模态

Text

架构

Mixture of Experts (MoE)

许可证

MIT

发布日期

1 Nov 2025

训练数据截止日期

技术规格

专家参数总数

3.0B

专家数量

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

层数

注意力头

键值头

激活函数

归一化

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Kimi Linear 48B A3B Instruct

Kimi Linear is a sophisticated large language model engineered by Moonshot AI, distinguished by its hybrid linear attention architecture. This model variant, Kimi Linear 48B A3B Instruct, integrates Kimi Delta Attention (KDA) with Multi-Head Latent Attention (MLA) layers. KDA represents an advanced linear attention mechanism, extending the Gated DeltaNet by incorporating a finer-grained, channel-wise gating mechanism. This design allows for independent control over memory decay rates across individual feature dimensions, thereby enhancing the regulation of the finite-state recurrent neural network (RNN) memory.

The Kimi Linear architecture employs a specific 3:1 ratio, interleaving KDA layers with periodic MLA layers. This strategic combination aims to balance computational efficiency with the ability to process global information effectively. The underlying chunkwise algorithm within KDA achieves hardware efficiency through a specialized variant of Diagonal-Plus-Low-Rank (DPLR) transition matrices. This approach significantly reduces computational overhead compared to general DPLR formulations, aligning with the classical delta rule while offering more consistency.

The design of Kimi Linear is particularly suited for applications requiring extended context processing and high decoding throughput. By reducing the key-value (KV) cache requirements by up to 75%, it mitigates a common bottleneck in transformer architectures. This efficiency gain enables the model to handle contexts up to 1 million tokens, achieving up to 6x faster decoding throughput in such scenarios. Kimi Linear functions as a drop-in replacement for traditional full attention architectures, offering performance and efficiency for tasks involving longer input and output sequences, including those found in reinforcement learning.

关于 Kimi Linear

Moonshot AI's hybrid linear attention architecture with Kimi Delta Attention for efficient long-context processing.

其他 Kimi Linear 模型

没有相关模型

评估基准

排名适用于本地LLM。

没有可用的 Kimi Linear 48B A3B Instruct 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

512k

1024k

所需显存:

资源

官方文档阅读论文下载权重源代码

Kimi Linear 48B A3B Instruct

技术规格

系统要求

Kimi Linear 48B A3B Instruct

关于 Kimi Linear

其他 Kimi Linear 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源