MiniMax M3

Closed Source

Open Weights

Parameters

428B

Context Length

Modality

Multimodal

Architecture

Dense

License

Proprietary

Release Date

1 Jun 2026

Knowledge Cutoff

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

900.43 GB VRAM

Consumer

51x RTX 4090

24GB VRAM

Datacenter

14x NVIDIA A100

80GB VRAM

Apple Silicon

11x Apple M3 Max

128GB VRAM

1,000,000 tokens

1029.32 GB VRAM

Consumer

60x RTX 4090

24GB VRAM

Datacenter

16x NVIDIA A100

80GB VRAM

Apple Silicon

13x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

Rank

#14

Benchmark	Score	Rank
Web Development WebDev Arena	1521	⭐ 10
General Text Text Arena	1451	25

Rankings

Overall Rank

#14

Coding Rank

#23

About MiniMax M3

MiniMax's flagship multimodal model released June 1, 2026. Powered by MiniMax Sparse Attention (MSA) architecture, which replaces traditional full attention with a KV-block selection pattern, drastically reducing compute costs to 1/20th of the previous generation. It is highly optimized for long-horizon agentic workflows, complex software engineering, and video understanding. Features a 1M token context window, supports text, image, and video inputs, and is priced at $0.30 per million input tokens and $1.20 per million output tokens.

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

128

Position Embedding

Absolute Position Embedding

RoPE Theta

5,000,000

Sliding Window Attention

Sliding Window Size

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

6,144

Number of Layers

FFN Intermediate Size (Dense)

12,288

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

200,064

Resources

Official Documentation Download Weights

About MiniMax M3

MiniMax's flagship M3 model family, released June 1, 2026, is powered by MiniMax Sparse Attention (MSA) architecture, offering 1M context capabilities at exceptionally low compute cost and optimized for long-horizon agentic workflows.

Other MiniMax M3 Models

No related models available