TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MMMU-Pro

Harder MMMU variant: filters out text-only-solvable items and adds a vision-only setting where the question itself is rendered into the image.

Multimodal Multimodal accuracy Max 100.0% Released Sep 2024
14
Results
14
Models scored
81.2%
Top: GPT 5.4
76.5%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Jan 2025Sep 2025May 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 GPT 5.4 81.2% Mar 5, 2026 self reported primary
2 Gemini 3 Flash (Thinking) 81.2% Dec 17, 2025 self reported primary
3 Gemini 3 Pro 81.0% CoT Nov 18, 2025 self reported primary
4 Gemini 3.1 Pro 80.5% CoT Feb 19, 2026 self reported primary
5 Kimi K2.6 79.4% Apr 20, 2026 self reported primary
6 GPT 5 (Thinking) 78.4% Aug 7, 2025 self reported primary
7 Gemma 4 76.9% Apr 3, 2026 self reported primary
8 GPT 5.5 Instant 76.0% 0-shot May 5, 2026 self reported primary
9 Qwen 3.5 35B A3B 75.1% Feb 15, 2025 third party primary verified
10 Claude Sonnet 4.6 74.5% Feb 17, 2026 self reported primary
11 Gemini 2.5 Pro (Thinking) 68.0% Dec 17, 2025 self reported primary
12 Gemini 2.5 Flash (Thinking) 66.7% Dec 17, 2025 self reported primary
13 GPT 5 62.7% Aug 7, 2025 self reported primary
14 Seed 1.5 59.3% Jan 22, 2025 self reported primary
0 AIs selected
Clear selection
#
Name
Task