TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MATH

MATH (Hendrycks)

12.5k competition mathematics problems (AMC, AIME, USAMO style). Reported as overall % or split by Level 1-5 difficulty. The "easy" levels are now saturated; Level 5 still discriminates.

Math Text Accuracy Max 100.0% Released Mar 2021 Saturated Possibly contaminated
17
Results
16
Models scored
88.6%
Top: Seed 1.5
69.2%
Median

Best results

Top primary scores; one row per model.
1
88.6%
3
80.0%
4
77.0%
5
76.6%
6
76.6%
7
73.3%

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Sep 2023Dec 2024Apr 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Seed 1.5 88.6% 22 Jan 2025 Self-reported Primary
2 Nemotron 3 Super 84.8% 4-shot 03 Apr 2026 Self-reported Primary
3 Command A 80.0% 07 Apr 2025 Self-reported Primary
4 Llama 3.3 77.0% 0-shot · CoT 06 Dec 2024 Self-reported Primary
5 Nova Pro 76.6% 0-shot · CoT 03 Dec 2024 Self-reported Primary
6 GPT-4o 76.6% 16 Apr 2025 Self-reported Primary
7 Nova Lite 73.3% 0-shot · CoT 03 Dec 2024 Self-reported Primary
8 Nova Micro 69.3% 0-shot · CoT 03 Dec 2024 Self-reported Primary
9 Claude Haiku 3.5 69.2% 0-shot · CoT 22 Oct 2024 Self-reported Primary
10 Claude Opus 3 60.1% 0-shot · CoT 22 Oct 2024 Self-reported Primary
11 Pixtral 12B 48.1% Maj@1 10 Oct 2024 Self-reported Primary
12 Llama 3.2 48.0% 0-shot · CoT 25 Oct 2024 Self-reported Primary
13 Mixtral 8x22B 28.4% 08 Jan 2024 Paper Primary
14 Mixtral 8x7B 28.4% 01 Dec 2023 Self-reported Primary
15 Gemma 2 15.0% 25 Feb 2025 Self-reported Primary
16 Gemma 2 15.0% 4-shot 25 Feb 2025 Self-reported Primary
17 Mistral 7B 13.1% 01 Sep 2023 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task