MMLU

Massive Multitask Language Understanding

Multiple-choice questions across 57 academic subjects (humanities, STEM, social sciences, professional). Standard 5-shot accuracy. Largely saturated by frontier models.

Knowledge Text Accuracy Max 100.0% Released Sep 2020 Saturated Possibly contaminated

Homepage Paper Code

Results

Models scored

90.2%

Top: GPT 4.1

79.1%

Median

Best results

Top primary scores; one row per model.

90.2%

88.7%

88.6%

87.4%

86.8%

86.4%

86.0%

86.0%

85.9%

85.5%

Frontier over time

Each dot is one model result; the line traces the running best score.

All results

Showing one canonical row per model. Show all configurations

#	Model	Score	Conditions	Eval date	Source	Flags
1	GPT 4.1	90.2%	—	14 Apr 2025	Self-reported	Primary
2	GPT-4o	88.7%	—	16 Apr 2025	Self-reported	Primary
3	Seed 1.5	88.6%	—	22 Jan 2025	Self-reported	Primary
4	Nova Premier	87.4%	—	30 Apr 2025	Self-reported	Primary
5	Claude Opus 3	86.8%	—	04 Mar 2024	Self-reported	Primary
6	GPT-4	86.4%	5-shot	01 Jan 2024	Paper	Primary
7	Nemotron 3 Super	86.0%	5-shot	03 Apr 2026	Self-reported	Primary
8	Llama 3.3	86.0%	0-shot · CoT	06 Dec 2024	Self-reported	Primary
9	Nova Pro	85.9%	0-shot · CoT	03 Dec 2024	Self-reported	Primary
10	Command A	85.5%	—	07 Apr 2025	Self-reported	Primary
11	Nova Lite	80.5%	0-shot · CoT	03 Dec 2024	Self-reported	Primary
12	Nova Micro	77.6%	0-shot · CoT	03 Dec 2024	Self-reported	Primary
13	Command R Plus	75.7%	—	04 Apr 2024	Self-reported	Primary
14	DBRX Instruct	73.7%	5-shot	27 Mar 2024	Self-reported	Primary
15	Mixtral 8x7B	70.6%	—	01 Dec 2023	Paper	Primary
16	Mixtral 8x22B	70.6%	—	08 Jan 2024	Paper	Primary
17	Pixtral 12B	69.2%	5-shot	10 Oct 2024	Self-reported	Primary
18	LLaMA 2	68.9%	5-shot	19 Jul 2023	Paper	Primary Verified
19	Mistral NeMo	68.0%	5-shot	18 Jul 2024	Self-reported	Primary
20	Llama 3.2	63.4%	—	25 Sep 2024	Self-reported	Primary
21	Mistral 7B	60.1%	—	01 Sep 2023	Paper	Primary
22	Gemma 2	51.3%	5-shot	25 Feb 2025	Self-reported	Primary

Go to section

Search

MMLU

Best results

Frontier over time

All results

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: