TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MMLU

Massive Multitask Language Understanding

Multiple-choice questions across 57 academic subjects (humanities, STEM, social sciences, professional). Standard 5-shot accuracy. Largely saturated by frontier models.

Knowledge Text accuracy Max 100.0% Released Sep 2020 Saturated Possibly contaminated
22
Results
22
Models scored
90.2%
Top: GPT 4.1
79.1%
Median

Best results

Top primary scores; one row per model.
1
90.2%
2
88.7%
3
88.6%
6
86.4%
8
86.0%
9
85.9%
10
85.5%

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Jul 2023Nov 2024Apr 2026

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 Claude Sonnet 3.5 90.4% 0-shot · standard Jun 20, 2024 self reported
2 GPT 4.1 90.2% Apr 14, 2025 self reported primary
3 GPT-4o 88.7% Apr 16, 2025 self reported primary
4 Seed 1.5 88.6% Jan 22, 2025 self reported primary
5 Nova Premier 87.4% Apr 30, 2025 self reported primary
6 Claude Opus 3 86.8% Mar 4, 2024 self reported primary
7 GPT-4 86.4% 5-shot Jan 1, 2024 paper primary
8 Nemotron 3 Super 86.0% 5-shot Apr 3, 2026 self reported primary
9 Llama 3.3 86.0% 0-shot · CoT Dec 6, 2024 self reported primary
10 Nova Pro 85.9% 0-shot · CoT Dec 3, 2024 self reported primary
11 Gemini 1.5 85.9% 5-shot · standard May 1, 2024 self reported
12 Command A 85.5% Apr 7, 2025 self reported primary
13 Nova Lite 80.5% 0-shot · CoT Dec 3, 2024 self reported primary
14 Claude 2 78.5% 5-shot · CoT · standard Jul 11, 2023 self reported
15 Nova Micro 77.6% 0-shot · CoT Dec 3, 2024 self reported primary
16 Command R Plus 75.7% Apr 4, 2024 self reported primary
17 Claude Haiku 3 75.2% 5-shot · standard Mar 4, 2024 self reported
18 DBRX Instruct 73.7% 5-shot Mar 27, 2024 self reported primary
19 Mixtral 8x7B 70.6% Dec 1, 2023 paper primary
20 Mixtral 8x22B 70.6% Jan 8, 2024 paper primary
21 GPT 3.5 70.0% 5-shot · standard Mar 14, 2023 self reported
22 Pixtral 12B 69.2% 5-shot Oct 10, 2024 self reported primary
23 LLaMA 2 68.9% 5-shot Jul 19, 2023 paper primary verified
24 Mistral NeMo 68.0% 5-shot Jul 18, 2024 self reported primary
25 Llama 3.2 63.4% Sep 25, 2024 self reported primary
26 Mistral 7B 60.1% Sep 1, 2023 paper primary
27 Gemma 2 51.3% 5-shot Feb 25, 2025 self reported primary
0 AIs selected
Clear selection
#
Name
Task