---
pipeline_tag: text-generation
license: apache-2.0
---
# MiniMax-M1
## 1. Model Overview
We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning
attention mechanism. The model is developed based on our previous [MiniMax-Text-01 model](https://huggingface.co/MiniMaxAI/MiniMax-Text-01),
which contains a total of 456 billion parameters with 45.9 billion parameters activated
per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1
million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism
in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek
R1, M1 consumes 25% of the FLOPs at a generation length of 100K tokens. These properties make M1
particularly suitable for complex tasks that require processing long inputs and thinking extensively.
MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems ranging from
traditional mathematical reasoning to sandbox-based, real-world software engineering environments.
We develop an efficient RL scaling framework for M1 highlighting two perspectives: (1) We propose
CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which
outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the
efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We
train two versions of MiniMax-M1 models with [40K](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k) and
[80K](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k) thinking budgets respectively. Experiments
on standard benchmarks show that our models outperform other strong open-weight models such as
the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using,
and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong
foundation for next-generation language model agents to reason and tackle real-world challenges.
Benchmark performance comparison of leading commercial and open-weight models across competition-level mathematics, coding, software engineering, agentic tool use, and long-context understanding tasks. We use the MiniMax-M1-80k model here for MiniMax-M1.
## 2. Evaluation
**Performance of MiniMax-M1 on core benchmarks.**
| **Category** | **Task** | **OpenAI-o3** | **Gemini 2.5 Pro (06-05)** | **Claude 4 Opus** | **Seed-Thinking-v1.5** | **DeepSeek-R1** | **DeepSeek-R1-0528** | **Qwen3-235B-A22B** | **MiniMax-M1-40K** | **MiniMax-M1-80K** |
|:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| | *Extended Thinking* | *100k* | *64k* | *64k* | *32k* | *32k* | *64k* | *32k* | *40K* | *80K* |
| ***Mathematics*** | AIME 2024 | 91.6 | 92.0 | 76.0 | 86.7 | 79.8 | 91.4 | 85.7 | 83.3 | 86.0 |
| | AIME 2025 | 88.9 | 88.0 | 75.5 | 74.0 | 70.0 | 87.5 | 81.5 | 74.6 | 76.9 |
| | MATH-500 | 98.1 | 98.8 | 98.2 | 96.7 | 97.3 | 98.0 | 96.2 | 96.0 | 96.8 |
| ***General Coding*** | LiveCodeBench *(24/8~25/5)* | 75.8 | 77.1 | 56.6 | 67.5 | 55.9 | 73.1 | 65.9 | 62.3 | 65.0 |
| | FullStackBench | 69.3 | -- | 70.3 | 69.9 | 70.1 | 69.4 | 62.9 | 67.6 | 68.3 |
| ***Reasoning & Knowledge***| GPQA Diamond | 83.3 | 86.4 | 79.6 | 77.3 | 71.5 | 81.0 | 71.1 | 69.2 | 70.0 |
| | HLE *(no tools)* | 20.3 | 21.6 | 10.7 | 8.2 | 8.6\* | 17.7\* | 7.6\* | 7.2\* | 8.4\* |
| | ZebraLogic | 95.8 | 91.6 | 95.1 | 84.4 | 78.7 | 95.1 | 80.3 | 80.1 | 86.8 |
| | MMLU-Pro | 85.0 | 86.0 | 85.0 | 87.0 | 84.0 | 85.0 | 83.0 | 80.6 | 81.1 |
| ***Software Engineering***| SWE-bench Verified| 69.1 | 67.2 | 72.5 | 47.0 | 49.2 | 57.6 | 34.4 | 55.6 | 56.0 |
| ***Long Context*** | OpenAI-MRCR *(128k)* | 56.5 | 76.8 | 48.9 | 54.3 | 35.8 | 51.5 | 27.7 | 76.1 | 73.4 |
| | OpenAI-MRCR *(1M)* | -- | 58.8 | -- | -- | -- | -- | -- | 58.6 | 56.2 |
| | LongBench-v2 | 58.8 | 65.0 | 55.6 | 52.5 | 58.3 | 52.1 | 50.1 | 61.0 | 61.5 |
| ***Agentic Tool Use***| TAU-bench *(airline)* | 52.0 | 50.0 | 59.6 | 44.0 | -- | 53.5 | 34.7 | 60.0 | 62.0 |
| | TAU-bench *(retail)* | 73.9 | 67.0 | 81.4 | 55.7 | -- | 63.9 | 58.6 | 67.8 | 63.5 |
| ***Factuality*** | SimpleQA | 49.4 | 54.0 | -- | 12.9 | 30.1 | 27.8 | 11.0 | 17.9 | 18.5 |
| ***General Assistant***| MultiChallenge | 56.5 | 51.8 | 45.8 | 43.0 | 40.7 | 45.0 | 40.0 | 44.7 | 44.7 |
\* conducted on the text-only HLE subset.
## 3. Deployment Guide
Download the model from HuggingFace repository:
- [MiniMax-M1-40k](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k)
- [MiniMax-M1-80k](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k)
For production deployment, we recommend using [vLLM](https://docs.vllm.ai/en/latest/) to serve MiniMax-M1. vLLM provides excellent performance for serving large language models with the following features:
- 🔥 Outstanding service throughout performance
- ⚡ Efficient and intelligent memory management
- 📦 Powerful batch request processing capability
- ⚙️ Deeply optimized underlying performance
For detailed vLLM deployment instructions, please refer to our [vLLM Deployment Guide](./docs/vllm_deployment_guide.md).
Alternatively, you can also deploy using Transformers directly. For detailed Transformers deployment instructions, you can see our [MiniMax-M1 Transformers Deployment Guide](./docs/transformers_deployment_guide.md).
## 4. Function Calling
The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. [MiniMax-M1 Function Call Guide](./docs/function_call_guide.md) provides detailed instructions on how to use the function calling feature of MiniMax-M1.
## 5. Chatbot & API
For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform/) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
## 6. Contact Us
Contact us at [model@minimaxi.com](mailto:model@minimaxi.com).