趋近智
活跃参数
229B
上下文长度
128K
模态
Text
架构
Mixture of Experts (MoE)
许可证
MIT
发布日期
7 Nov 2025
训练数据截止日期
Jun 2024
专家参数总数
10.0B
专家数量
8
活跃专家
2
注意力结构
Multi-Head Attention
隐藏维度大小
4096
层数
32
注意力头
32
键值头
8
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
MiniMax M2 is a sparse Mixture of Experts (MoE) transformer model engineered by MiniMax for high-efficiency performance in complex coding and agentic workflows. By utilizing a total parameter count of 229 billion while only activating approximately 10 billion parameters per token during inference, the architecture achieves a high ratio of stored knowledge to computational throughput. This design permits the model to handle long-horizon tasks such as multi-file repository editing and iterative code-run-fix loops with the latency profiles typically associated with much smaller dense models.
The model's technical foundation is built on a full-attention mechanism that incorporates Rotary Position Embeddings (RoPE) for stable long-context handling. It utilizes Root Mean Square Layer Normalization (RMSNorm) and the SiLU (Swiglu) activation function to ensure training stability and representational efficiency. Architecturally, it features 32 hidden layers with a hidden dimension of 4096, employing a Top-2 routing strategy to distribute workloads across its internal expert modules. The integration of a 128,000-token context window supports the ingestion of large technical documents and extensive codebases, facilitating consistent reasoning over deep information hierarchies.
Optimized for autonomous agent environments, MiniMax M2 provides native support for external tool integration through a structured reasoning trace system. The model maintains internal decision-making logs between turns, which allows it to recover from execution errors in shell environments or web-browsing tasks. Its efficient inference footprint makes it a candidate for deployment in continuous integration pipelines and integrated development environments where fast feedback cycles and low operational costs are required.
MiniMax's efficient MoE models built for coding and agentic workflows.
排名
#59
| 基准 | 分数 | 排名 |
|---|---|---|
StackEval ProLLM Stack Eval | 0.96 | 6 |
Professional Knowledge MMLU Pro | 0.82 | 10 |
Graduate-Level QA GPQA | 0.78 | 21 |
Web Development WebDev Arena | 1347 | 29 |