MMMU-Pro
Harder MMMU variant: filters out text-only-solvable items and adds a vision-only setting where the question itself is rendered into the image.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | GPT 5.4 | 81.2% | — | Mar 5, 2026 | self reported | primary |
| 2 | Gemini 3 Flash (Thinking) | 81.2% | — | Dec 17, 2025 | self reported | primary |
| 3 | Gemini 3 Pro | 81.0% | CoT | Nov 18, 2025 | self reported | primary |
| 4 | Gemini 3.1 Pro | 80.5% | CoT | Feb 19, 2026 | self reported | primary |
| 5 | Kimi K2.6 | 79.4% | — | Apr 20, 2026 | self reported | primary |
| 6 | GPT 5 (Thinking) | 78.4% | — | Aug 7, 2025 | self reported | primary |
| 7 | Gemma 4 | 76.9% | — | Apr 3, 2026 | self reported | primary |
| 8 | GPT 5.5 Instant | 76.0% | 0-shot | May 5, 2026 | self reported | primary |
| 9 | Qwen 3.5 35B A3B | 75.1% | — | Feb 15, 2025 | third party | primary verified |
| 10 | Claude Sonnet 4.6 | 74.5% | — | Feb 17, 2026 | self reported | primary |
| 11 | Gemini 2.5 Pro (Thinking) | 68.0% | — | Dec 17, 2025 | self reported | primary |
| 12 | Gemini 2.5 Flash (Thinking) | 66.7% | — | Dec 17, 2025 | self reported | primary |
| 13 | GPT 5 | 62.7% | — | Aug 7, 2025 | self reported | primary |
| 14 | Seed 1.5 | 59.3% | — | Jan 22, 2025 | self reported | primary |
