MMLU-Pro

#	Model	Score	Conditions	Eval date	Source	Flags
1	GPT OSS 120B	90.0%	CoT	05 Aug 2025	Self-reported	Primary
2	Qwen 3.6 27B	86.2%	standard	—	Self-reported	Primary
3	Qwen 3.5 27B	86.1%	—	24 Feb 2026	Third-party	Primary Verified
4	Gemma 4	85.2%	CoT	03 Apr 2026	Self-reported	Primary
5	DeepSeek V3.2 Exp	85.0%	CoT	29 Sep 2025	Self-reported	Primary
6	Deepseek 3.2	85.0%	—	01 Dec 2025	Paper	Primary
7	DeepSeek V3.1 Terminus	85.0%	—	22 Sep 2025	Self-reported	Primary
8	DeepSeek-R1	84.0%	CoT	21 Jan 2025	Paper	Primary
9	Trinity Large Thinking	83.4%	0-shot · standard	01 Apr 2026	Self-reported	Primary
10	Llama 4 Behemoth	82.2%	—	05 Apr 2025	Self-reported	Primary
11	Llama 4 Maverick	80.5%	—	05 Apr 2025	Self-reported	Primary
12	Seed 1.5	80.1%	0-shot · CoT	22 Jan 2025	Self-reported	Primary
13	Grok 3	79.9%	—	19 Feb 2025	Self-reported	Primary
14	Grok 3	79.9%	—	19 Feb 2025	Self-reported	Primary
15	Grok 3 mini	78.9%	—	19 Feb 2025	Self-reported	Primary
16	Nemotron 3 Nano	78.3%	—	15 Dec 2025	Self-reported	Primary
17	Nemotron 3	78.3%	standard	15 Dec 2025	Self-reported	Primary
18	Gemma 3	78.0%	—	20 May 2025	Self-reported	Primary
19	Phi 4 reasoning plus	76.0%	—	08 Jul 2025	Self-reported	Primary
20	DeepSeek V3	75.9%	—	26 Dec 2024	Paper	Primary
21	Nemotron 3 Super	75.7%	5-shot · CoT	03 Apr 2026	Self-reported	Primary
22	Llama 4 Scout	74.3%	—	05 Apr 2025	Self-reported	Primary
23	Command A	69.6%	—	07 Apr 2025	Paper	Primary
24	Llama 3.3	68.9%	5-shot · CoT	06 Dec 2024	Self-reported	Primary
25	Mistral Small 3	66.3%	5-shot · CoT	30 Jan 2025	Self-reported	Primary
26	Claude Haiku 3.5	41.6%	0-shot · CoT	22 Oct 2024	Self-reported	Primary

Go to section

Search

MMLU-Pro

Best results

Frontier over time

All results

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: