TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MMLU

Massive Multitask Language Understanding

Multiple-choice questions across 57 academic subjects (humanities, STEM, social sciences, professional). Standard 5-shot accuracy. Largely saturated by frontier models.

Knowledge Text Accuracy Max 100.0% Released Sep 2020 Saturated Possibly contaminated
22
Results
22
Models scored
90.2%
Top: GPT 4.1
79.1%
Median

Best results

Top primary scores; one row per model.
1
90.2%
2
88.7%
3
88.6%
6
86.4%
8
86.0%
9
85.9%
10
85.5%

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Jul 2023Nov 2024Apr 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 GPT 4.1 90.2% 14 Apr 2025 Self-reported Primary
2 GPT-4o 88.7% 16 Apr 2025 Self-reported Primary
3 Seed 1.5 88.6% 22 Jan 2025 Self-reported Primary
4 Nova Premier 87.4% 30 Apr 2025 Self-reported Primary
5 Claude Opus 3 86.8% 04 Mar 2024 Self-reported Primary
6 GPT-4 86.4% 5-shot 01 Jan 2024 Paper Primary
7 Nemotron 3 Super 86.0% 5-shot 03 Apr 2026 Self-reported Primary
8 Llama 3.3 86.0% 0-shot · CoT 06 Dec 2024 Self-reported Primary
9 Nova Pro 85.9% 0-shot · CoT 03 Dec 2024 Self-reported Primary
10 Command A 85.5% 07 Apr 2025 Self-reported Primary
11 Nova Lite 80.5% 0-shot · CoT 03 Dec 2024 Self-reported Primary
12 Nova Micro 77.6% 0-shot · CoT 03 Dec 2024 Self-reported Primary
13 Command R Plus 75.7% 04 Apr 2024 Self-reported Primary
14 DBRX Instruct 73.7% 5-shot 27 Mar 2024 Self-reported Primary
15 Mixtral 8x7B 70.6% 01 Dec 2023 Paper Primary
16 Mixtral 8x22B 70.6% 08 Jan 2024 Paper Primary
17 Pixtral 12B 69.2% 5-shot 10 Oct 2024 Self-reported Primary
18 LLaMA 2 68.9% 5-shot 19 Jul 2023 Paper Primary Verified
19 Mistral NeMo 68.0% 5-shot 18 Jul 2024 Self-reported Primary
20 Llama 3.2 63.4% 25 Sep 2024 Self-reported Primary
21 Mistral 7B 60.1% 01 Sep 2023 Paper Primary
22 Gemma 2 51.3% 5-shot 25 Feb 2025 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task