MMLU-Pro
A harder, more reasoning-focused replacement for MMLU. 10 answer choices instead of 4 and curated to remove trivially answerable items.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | GPT OSS 120B | 90.0% | CoT | Aug 5, 2025 | self reported | primary |
| 2 | Qwen 3.5 27B | 86.1% | — | Feb 24, 2026 | third party | primary verified |
| 3 | Gemma 4 | 85.2% | CoT | Apr 3, 2026 | self reported | primary |
| 4 | DeepSeek V3.2 Exp | 85.0% | CoT | Sep 29, 2025 | self reported | primary |
| 5 | Deepseek 3.2 | 85.0% | — | Dec 1, 2025 | paper | primary |
| 6 | DeepSeek V3.1 Terminus | 85.0% | — | Sep 22, 2025 | self reported | primary |
| 7 | DeepSeek-R1 | 84.0% | CoT | Jan 21, 2025 | paper | primary |
| 8 | Llama 4 Behemoth | 82.2% | — | Apr 5, 2025 | self reported | primary |
| 9 | Llama 4 Maverick | 80.5% | — | Apr 5, 2025 | self reported | primary |
| 10 | Seed 1.5 | 80.1% | 0-shot · CoT | Jan 22, 2025 | self reported | primary |
| 11 | Grok 3 | 79.9% | — | Feb 19, 2025 | self reported | primary |
| 12 | Grok 3 | 79.9% | — | Feb 19, 2025 | self reported | primary |
| 13 | Grok 3 mini | 78.9% | — | Feb 19, 2025 | self reported | primary |
| 14 | Nemotron 3 Nano | 78.3% | — | Dec 15, 2025 | self reported | primary |
| 15 | Gemma 3 | 78.0% | — | May 20, 2025 | self reported | primary |
| 16 | Phi 4 reasoning plus | 76.0% | — | Jul 8, 2025 | self reported | primary |
| 17 | DeepSeek V3 | 75.9% | — | Dec 26, 2024 | paper | primary |
| 18 | Nemotron 3 Super | 75.7% | 5-shot · CoT | Apr 3, 2026 | self reported | primary |
| 19 | Llama 4 Scout | 74.3% | — | Apr 5, 2025 | self reported | primary |
| 20 | Command A | 69.6% | — | Apr 7, 2025 | paper | primary |
| 21 | Llama 3.3 | 68.9% | 5-shot · CoT | Dec 6, 2024 | self reported | primary |
| 22 | Mistral Small 3 | 66.3% | 5-shot · CoT | Jan 30, 2025 | self reported | primary |
| 23 | Claude Haiku 3.5 | 41.6% | 0-shot · CoT | Oct 22, 2024 | self reported | primary |
