Parameters
428B
Context Length
1M
Modality
Multimodal
Architecture
Dense
License
Proprietary
Release Date
1 Jun 2026
Knowledge Cutoff
-
VRAM requirements for different quantization methods and context sizes
1,024 tokens
Consumer
51x RTX 4090
24GB VRAM
Datacenter
14x NVIDIA A100
80GB VRAM
Apple Silicon
11x Apple M3 Max
128GB VRAM
1,000,000 tokens
Consumer
60x RTX 4090
24GB VRAM
Datacenter
16x NVIDIA A100
80GB VRAM
Apple Silicon
13x Apple M3 Max
128GB VRAM
Rank
#14
| Benchmark | Score | Rank |
|---|---|---|
Web Development WebDev Arena | 1521 | ⭐ 10 |
General Text Text Arena | 1451 | 25 |
Overall Rank
#14
Coding Rank
#23
MiniMax's flagship multimodal model released June 1, 2026. Powered by MiniMax Sparse Attention (MSA) architecture, which replaces traditional full attention with a KV-block selection pattern, drastically reducing compute costs to 1/20th of the previous generation. It is highly optimized for long-horizon agentic workflows, complex software engineering, and video understanding. Features a 1M token context window, supports text, image, and video inputs, and is priced at $0.30 per million input tokens and $1.20 per million output tokens.
Attention
Attention Structure
Multi-Head Attention
Attention Heads
64
Key-Value Heads
4
Attention Head Dimension
128
Position Embedding
Absolute Position Embedding
RoPE Theta
5,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
6,144
Number of Layers
60
FFN Intermediate Size (Dense)
12,288
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
200,064
MiniMax's flagship M3 model family, released June 1, 2026, is powered by MiniMax Sparse Attention (MSA) architecture, offering 1M context capabilities at exceptionally low compute cost and optimized for long-horizon agentic workflows.
APX AI
Online