MiniMax M2.5 Review: The Cheapest Frontier AI That Earns Its Price
Pricing pressure is finally hitting frontier AI models from an unexpected direction. MiniMax, a Shanghai-based lab founded in 2021, released M2.5 in February 2026 with benchmark scores that rival Claude Opus and GPT-4o and a price tag that is 10 to 20 times lower. That is not a typo. This is what happens when the Chinese AI ecosystem starts competing on global infrastructure economics.
M2.5 is worth paying attention to for anyone building AI-powered applications, particularly anything involving agentic workflows, code generation, or multi-step task automation. Here is what the data actually shows and where the model has real limitations.
What MiniMax M2.5 Actually Is
M2.5 is a mixture-of-experts reasoning model with 230 billion total parameters and 10 billion active. The MoE architecture means the model routes each inference through a subset of its parameters rather than activating all of them, which is how it achieves fast throughput at low cost without sacrificing capability.
Released February 12, 2026, M2.5 improves substantially on its predecessor. SWE-Bench Verified score: 80.2%, up from 74% on M2.1, and currently among the highest reported scores on this standard software engineering benchmark. Context window: 205,000 tokens. Inference speed: 50 tokens per second for the Standard variant, 100 tokens per second for the Lightning variant. Both variants have identical capability; the difference is throughput versus per-token cost.
MiniMax positions M2.5 as a model built for agentic workflows rather than single-turn conversation. The distinction matters. The model was trained across more than 200,000 real-world software environments and more than 10 programming languages, with specific emphasis on multi-step tasks: writing code, running it, checking the output, and revising. It also handles structured document work, including generating and editing Word, Excel, and PowerPoint files, which is less common among pure coding models.
The Pricing Case
Standard API pricing for M2.5 as of March 2026: $0.30 per million input tokens, $1.20 per million output tokens. For Lightning: $0.30 input, $2.40 output. Automatic prompt caching is included in both tiers without manual configuration.
For comparison, Claude Opus 4.6 runs at $15 per million input tokens and $75 per million output tokens. GPT-4o is cheaper than Opus but still several times the cost of M2.5. At these rates, a task that costs $7.50 in Claude Opus output tokens costs roughly $0.12 in M2.5 output. For teams running large-scale agentic pipelines where the model is generating substantial output across many interactions, this changes the cost structure of a product entirely.
The model weights are also open. Organizations can download M2.5 from Hugging Face and run it on their own infrastructure. This eliminates per-token costs entirely for teams with the compute budget to self-host, and it removes the dependency on MiniMax's API availability, which matters for enterprise compliance requirements.
Where It Performs
Coding and software engineering are where M2.5 has the strongest documented performance. The 80.2% SWE-Bench score is the headline number, and the benchmark methodology is reasonably rigorous, requiring the model to resolve real open-source GitHub issues by modifying actual codebases, not just generating code snippets.
Tool calling is another genuine strength. On the Berkeley Function Calling Leaderboard, M2.5 scores 76.8%, which the company reports as more than 13 percentage points above comparable frontier models in multi-turn scenarios. For agentic applications that require reliable tool invocation across long conversations, this is a meaningful differentiator.
The Architect Mode feature lets M2.5 serve as a planning and orchestration layer: it breaks a complex request into subtasks, delegates execution to tools or other model calls, and synthesizes results. This is useful for large software refactors or complex research workflows where single-shot generation is insufficient.
BrowseComp performance is 76.3% with context management, suggesting strong capability for web research and information synthesis tasks. The Artificial Analysis Intelligence Index score is 42, which places it well above the median for open-weight models of comparable size.
What to Be Cautious About
M2.5 is a reasoning model, and reasoning models have a predictable cost: latency. The model thinks before it responds. For conversational applications where users expect near-instant replies, this is a real limitation. Reasoning models work well for background processing and autonomous agent tasks; they work poorly as front-end chat interfaces where perceived speed matters.
The Lightning variant mitigates this somewhat at higher output cost, but neither variant is optimized for low-latency conversational interaction. If you are building a customer-facing chatbot that needs to respond in under two seconds, M2.5 is likely not the right tool regardless of its benchmark scores.
Western market availability is also worth noting. MiniMax is a Chinese company, and while its API is accessible globally, some enterprise procurement and compliance teams will need to evaluate data residency and vendor risk before deploying it in regulated environments. The open-weights option addresses this for teams with the infrastructure to use it.
The Honest Assessment
MiniMax M2.5 is not a replacement for every use case involving Claude or GPT-4o. It is specifically strong for coding, agentic task execution, and structured document work, and specifically weak for low-latency conversational applications. The open-weights availability and pricing gap are real, not marketing.
For teams building AI infrastructure around code generation, automated workflows, or batch processing tasks, M2.5 warrants serious evaluation. The cost differential at scale is not a rounding error. A pipeline that would cost $50,000 per month in Claude Opus API fees could cost $3,000 to $5,000 in M2.5 fees, or near zero with self-hosting.
That kind of economics reshapes what is financially viable to build. That is the actual story of MiniMax M2.5: not the benchmarks, but the math.
You can find MiniMax's full profile on Chatbot Gallery, alongside our Doubao review for another look at what Chinese AI labs are building. See our Best AI Chatbots of 2026 roundup for broader comparison context.
This article contains no affiliate links. Alex Chen writes about AI platforms and enterprise software for About.chat.