ApX 标志ApX 标志

趋近智

MiniMax M2

活跃参数

229B

上下文长度

128K

模态

Text

架构

Mixture of Experts (MoE)

许可证

MIT

发布日期

7 Nov 2025

训练数据截止日期

Jun 2024

技术规格

专家参数总数

10.0B

专家数量

8

活跃专家

2

注意力结构

Multi-Head Attention

隐藏维度大小

4096

层数

32

注意力头

32

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

MiniMax M2

MiniMax M2 is a sparse Mixture of Experts (MoE) transformer model engineered by MiniMax for high-efficiency performance in complex coding and agentic workflows. By utilizing a total parameter count of 229 billion while only activating approximately 10 billion parameters per token during inference, the architecture achieves a high ratio of stored knowledge to computational throughput. This design permits the model to handle long-horizon tasks such as multi-file repository editing and iterative code-run-fix loops with the latency profiles typically associated with much smaller dense models.

The model's technical foundation is built on a full-attention mechanism that incorporates Rotary Position Embeddings (RoPE) for stable long-context handling. It utilizes Root Mean Square Layer Normalization (RMSNorm) and the SiLU (Swiglu) activation function to ensure training stability and representational efficiency. Architecturally, it features 32 hidden layers with a hidden dimension of 4096, employing a Top-2 routing strategy to distribute workloads across its internal expert modules. The integration of a 128,000-token context window supports the ingestion of large technical documents and extensive codebases, facilitating consistent reasoning over deep information hierarchies.

Optimized for autonomous agent environments, MiniMax M2 provides native support for external tool integration through a structured reasoning trace system. The model maintains internal decision-making logs between turns, which allows it to recover from execution errors in shell environments or web-browsing tasks. Its efficient inference footprint makes it a candidate for deployment in continuous integration pipelines and integrated development environments where fast feedback cycles and low operational costs are required.

关于 MiniMax M2

MiniMax's efficient MoE models built for coding and agentic workflows.


其他 MiniMax M2 模型
  • 没有相关模型

评估基准

排名

#59

基准分数排名

0.96

6

Professional Knowledge

MMLU Pro

0.82

10

Graduate-Level QA

GPQA

0.78

21

Web Development

WebDev Arena

1347

29

排名

排名

#59

编程排名

#72

模型透明度

总分

B-

63 / 100

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
63k
125k

所需显存:

推荐 GPU

MiniMax M2:规格和 GPU 显存要求