ApX logoApX logo

Minimax M2.5

Parameters

-

Context Length

128K

Modality

Multimodal

Architecture

Dense

License

Proprietary

Release Date

15 Dec 2025

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

-

Key-Value Heads

-

Attention Head Dimension

-

Position Embedding

Absolute Position Embedding

RoPE Theta

-

Sliding Window Attention

-

Sliding Window Size

-

Normalization

-

Activation Function

-

Dimensions

Hidden Dimension Size

-

Number of Layers

-

FFN Intermediate Size (Dense)

-

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

-

Minimax M2.5

Minimax M2.5 is an advanced multimodal model offering state-of-the-art text generation and reasoning capabilities. Features strong multilingual support with particular emphasis on Chinese and English. Designed for versatile applications including content generation, analysis, and conversational AI with competitive performance across multiple benchmarks.

About Minimax M2

Minimax M2.5 series represents cutting-edge multimodal AI models from Minimax AI, featuring state-of-the-art performance in text generation, reasoning, and multilingual understanding. These models combine high-quality language understanding with efficient architecture, optimized for both API deployment and enterprise solutions.


Other Minimax M2 Models
  • No related models available

Evaluation Benchmarks

Rank

#74

BenchmarkScoreRank

Agentic Coding

LiveBench Agentic

0.52

17

0.66

17

Graduate-Level QA

GPQA

0.81

22

0.77

27

Professional Knowledge

MMLU Pro

0.80

35

0.71

36

0.50

38

0.59

40

Web Development

WebDev Arena

1382

47

General Text

Text Arena

1390

57

Rankings

Overall Rank

#74

Coding Rank

#62

Model Integrity

Total Score

C+

54 / 100

Minimax M2.5 Model Integrity Report

Total Score

54

/ 100

C+

Audit Note

MiniMax M2.5 demonstrates a moderate level of transparency, particularly in disclosing its Mixture-of-Experts parameter counts and providing detailed hardware requirements for local deployment. However, the model suffers from significant opacity regarding its training data composition and compute resources. While it provides impressive benchmark results, the lack of reproducible evaluation artifacts and emerging concerns about score validity represent critical transparency gaps.

Upstream

14.0 / 30

Architectural Provenance

6.0 / 10

MiniMax M2.5 is documented as a Mixture-of-Experts (MoE) model utilizing 'Lightning Attention' and a Top-2 routing strategy. While the architecture is named and some high-level details are provided (32 hidden layers, 4096 hidden dimension), the specific pre-training methodology and detailed architectural modifications from the base transformer are not fully disclosed in a technical paper. It is described as an evolution of the M2 series, but the exact delta in training procedure is missing.

Dataset Composition

3.0 / 10

Disclosure regarding training data is minimal and largely qualitative. The provider states the model was trained on '10+ programming languages' and '200,000+ real-world environments' using a proprietary reinforcement learning framework (Forge). However, there is no public breakdown of the dataset composition (e.g., web vs. code percentages), no information on data filtering/cleaning protocols, and no sample data available for inspection.

Tokenizer Integrity

5.0 / 10

The model uses a unified tokenizer for multimodal processing (text, image, audio), which is a significant architectural claim. While the tokenizer is accessible via the model weights on Hugging Face, official documentation regarding vocabulary size, specific tokenization algorithms, and training data alignment is sparse. Third-party tools like SGLang and vLLM support it, but official technical specifications are lacking.

Model

21.0 / 40

Parameter Density

7.0 / 10

MiniMax provides specific figures for both total and active parameters: 230 billion total parameters with 10 billion active per forward pass. This level of MoE transparency is better than many competitors. However, a detailed architectural breakdown of parameter distribution (e.g., attention vs. FFN) is not publicly documented.

Training Compute

2.0 / 10

Information on training compute is extremely limited. While some anecdotal evidence suggests a training period of approximately two months, there is no official disclosure of total GPU/TPU hours, hardware cluster specifications, or carbon footprint. The company cites 'efficiency' but provides no verifiable metrics to back the claim.

Benchmark Reproducibility

3.0 / 10

The model reports high scores on standard benchmarks like SWE-Bench Verified (80.2%) and BrowseComp (76.3%). However, the evaluation code and exact prompts used are not fully public. Furthermore, significant discrepancies have been noted by third-party audits regarding the validity of these scores, and the reliance on internal benchmarks like 'VIBE-Pro' further complicates independent verification.

Identity Consistency

9.0 / 10

The model consistently identifies itself as MiniMax-M2.5 and is transparent about its origin and purpose as an agentic AI. It maintains clear versioning (M2 -> M2.1 -> M2.5) and does not attempt to mimic the identity of other models in its system prompts or official communications.

Downstream

19.0 / 30

License Clarity

6.0 / 10

The model is released under a 'Modified-MIT' license. While the license text is available, the 'Modified' prefix creates ambiguity. For M2.5, the license generally allows commercial use, but subsequent versions (M2.7) have introduced more restrictive terms requiring written authorization, leading to community confusion regarding the long-term stability of the licensing model.

Hardware Footprint

8.0 / 10

Hardware requirements are well-documented by both the provider and community partners. VRAM requirements for various quantization levels (FP16, FP8, Q3_K_XL) are clearly stated (e.g., ~457GB for BF16, ~101GB for 3-bit GGUF). Documentation includes specific guidance for running on consumer hardware (e.g., 2x4090) and context length memory scaling.

Versioning Drift

5.0 / 10

MiniMax maintains a versioned release cycle with a public changelog. However, the documentation for these updates is often high-level marketing summaries rather than detailed technical changelogs. There is limited information on how model behavior or safety guardrails change between sub-versions, and previous versions are not always easily accessible for drift testing.