Parameters
-
Context Length
128K
Modality
Multimodal
Architecture
Dense
License
Proprietary
Release Date
15 Dec 2025
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Head Attention
Attention Heads
-
Key-Value Heads
-
Attention Head Dimension
-
Position Embedding
Absolute Position Embedding
RoPE Theta
-
Sliding Window Attention
-
Sliding Window Size
-
Normalization
-
Activation Function
-
Dimensions
Hidden Dimension Size
-
Number of Layers
-
FFN Intermediate Size (Dense)
-
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
-
Minimax M2.5 is an advanced multimodal model offering state-of-the-art text generation and reasoning capabilities. Features strong multilingual support with particular emphasis on Chinese and English. Designed for versatile applications including content generation, analysis, and conversational AI with competitive performance across multiple benchmarks.
Minimax M2.5 series represents cutting-edge multimodal AI models from Minimax AI, featuring state-of-the-art performance in text generation, reasoning, and multilingual understanding. These models combine high-quality language understanding with efficient architecture, optimized for both API deployment and enterprise solutions.
Rank
#74
| Benchmark | Score | Rank |
|---|---|---|
Agentic Coding LiveBench Agentic | 0.52 | 17 |
StackUnseen ProLLM Stack Unseen | 0.66 | 17 |
Graduate-Level QA GPQA | 0.81 | 22 |
Mathematics LiveBench Mathematics | 0.77 | 27 |
Professional Knowledge MMLU Pro | 0.80 | 35 |
Coding LiveBench Coding | 0.71 | 36 |
Data Analysis LiveBench Data Analysis | 0.50 | 38 |
Reasoning LiveBench Reasoning | 0.59 | 40 |
Web Development WebDev Arena | 1382 | 47 |
General Text Text Arena | 1390 | 57 |
Overall Rank
#74
Coding Rank
#62
Total Score
54
/ 100
MiniMax M2.5 demonstrates a moderate level of transparency, particularly in disclosing its Mixture-of-Experts parameter counts and providing detailed hardware requirements for local deployment. However, the model suffers from significant opacity regarding its training data composition and compute resources. While it provides impressive benchmark results, the lack of reproducible evaluation artifacts and emerging concerns about score validity represent critical transparency gaps.
Architectural Provenance
MiniMax M2.5 is documented as a Mixture-of-Experts (MoE) model utilizing 'Lightning Attention' and a Top-2 routing strategy. While the architecture is named and some high-level details are provided (32 hidden layers, 4096 hidden dimension), the specific pre-training methodology and detailed architectural modifications from the base transformer are not fully disclosed in a technical paper. It is described as an evolution of the M2 series, but the exact delta in training procedure is missing.
Dataset Composition
Disclosure regarding training data is minimal and largely qualitative. The provider states the model was trained on '10+ programming languages' and '200,000+ real-world environments' using a proprietary reinforcement learning framework (Forge). However, there is no public breakdown of the dataset composition (e.g., web vs. code percentages), no information on data filtering/cleaning protocols, and no sample data available for inspection.
Tokenizer Integrity
The model uses a unified tokenizer for multimodal processing (text, image, audio), which is a significant architectural claim. While the tokenizer is accessible via the model weights on Hugging Face, official documentation regarding vocabulary size, specific tokenization algorithms, and training data alignment is sparse. Third-party tools like SGLang and vLLM support it, but official technical specifications are lacking.
Parameter Density
MiniMax provides specific figures for both total and active parameters: 230 billion total parameters with 10 billion active per forward pass. This level of MoE transparency is better than many competitors. However, a detailed architectural breakdown of parameter distribution (e.g., attention vs. FFN) is not publicly documented.
Training Compute
Information on training compute is extremely limited. While some anecdotal evidence suggests a training period of approximately two months, there is no official disclosure of total GPU/TPU hours, hardware cluster specifications, or carbon footprint. The company cites 'efficiency' but provides no verifiable metrics to back the claim.
Benchmark Reproducibility
The model reports high scores on standard benchmarks like SWE-Bench Verified (80.2%) and BrowseComp (76.3%). However, the evaluation code and exact prompts used are not fully public. Furthermore, significant discrepancies have been noted by third-party audits regarding the validity of these scores, and the reliance on internal benchmarks like 'VIBE-Pro' further complicates independent verification.
Identity Consistency
The model consistently identifies itself as MiniMax-M2.5 and is transparent about its origin and purpose as an agentic AI. It maintains clear versioning (M2 -> M2.1 -> M2.5) and does not attempt to mimic the identity of other models in its system prompts or official communications.
License Clarity
The model is released under a 'Modified-MIT' license. While the license text is available, the 'Modified' prefix creates ambiguity. For M2.5, the license generally allows commercial use, but subsequent versions (M2.7) have introduced more restrictive terms requiring written authorization, leading to community confusion regarding the long-term stability of the licensing model.
Hardware Footprint
Hardware requirements are well-documented by both the provider and community partners. VRAM requirements for various quantization levels (FP16, FP8, Q3_K_XL) are clearly stated (e.g., ~457GB for BF16, ~101GB for 3-bit GGUF). Documentation includes specific guidance for running on consumer hardware (e.g., 2x4090) and context length memory scaling.
Versioning Drift
MiniMax maintains a versioned release cycle with a public changelog. However, the documentation for these updates is often high-level marketing summaries rather than detailed technical changelogs. There is limited information on how model behavior or safety guardrails change between sub-versions, and previous versions are not always easily accessible for drift testing.
APX AI
Online