ApX 标志ApX 标志

趋近智

MiMo V2 Flash

活跃参数

15B

上下文长度

256K

模态

Text

架构

Mixture of Experts (MoE)

许可证

MIT

发布日期

10 Dec 2025

训练数据截止日期

Dec 2024

技术规格

专家参数总数

309.0B

专家数量

256

活跃专家

8

注意力结构

Multi-Head Attention

隐藏维度大小

4096

层数

48

注意力头

64

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

MiMo V2 Flash

The Xiaomi MiMo V2 Flash is a high-efficiency Mixture-of-Experts (MoE) language model engineered for advanced reasoning, software engineering, and autonomous agentic workflows. Built upon a sparse architecture, the model incorporates a total of 309 billion parameters while activating only 15 billion parameters per forward pass, effectively balancing the modeling capacity of a large-scale system with the inference speed and operational efficiency of a significantly smaller dense model. Its development focus centers on high-throughput performance, achieving high decoding speeds through structural innovations designed to alleviate the computational and memory bottlenecks typically associated with large-scale transformer models.

Technically, MiMo V2 Flash introduces a hybrid attention mechanism that interleaves Sliding Window Attention (SWA) and Global Attention (GA) in a 5:1 ratio across its transformer blocks. This configuration utilizes an aggressive 128-token sliding window, which reduces KV-cache memory requirements by nearly six-fold compared to standard global attention, while a learnable attention sink bias ensures stable long-context performance. Furthermore, the model features a native Multi-Token Prediction (MTP) module consisting of lightweight 0.33 billion parameter dense feed-forward blocks. This MTP architecture facilitates parallel token generation and verification, resulting in a reported increase in decoding throughput by 2.0 to 2.6 times relative to conventional autoregressive generation methods.

Pre-trained on a massive 27 trillion token corpus using FP8 mixed precision, MiMo V2 Flash supports a native sequence length of 32,000 tokens and is capable of handling context windows up to 256,000 tokens. The post-training phase utilizes a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm and large-scale reinforcement learning, specifically targeting complex reasoning and multi-step tool use. This specialized training enables the model to perform reliably in demanding technical scenarios, such as document analysis and extended agentic interactions, making it a resource-optimized solution for researchers and developers requiring state-of-the-art performance in open-weight formats.

关于 MiMo V2

MiMo-V2-Flash is a Mixture-of-Experts (MoE) model with hybrid attention architecture designed for high-speed reasoning and agentic workflows. It features Multi-Token Prediction (MTP) to achieve state-of-the-art performance while significantly reducing inference costs. The model is optimized for long-context modeling and efficient inference.


其他 MiMo V2 模型
  • 没有相关模型

评估基准

排名

#40

基准分数排名

Graduate-Level QA

GPQA

0.84

9

排名

排名

#40

编程排名

-

模型透明度

总分

B

68 / 100

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
125k
250k

所需显存:

推荐 GPU