MATH
12.5k competition mathematics problems (AMC, AIME, USAMO style). Reported as overall % or split by Level 1-5 difficulty. The "easy" levels are now saturated; Level 5 still discriminates.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Seed 1.5 | 88.6% | — | 22 Jan 2025 | Self-reported | Primary |
| 2 | Nemotron 3 Super | 84.8% | 4-shot | 03 Apr 2026 | Self-reported | Primary |
| 3 | Command A | 80.0% | — | 07 Apr 2025 | Self-reported | Primary |
| 4 | Llama 3.3 | 77.0% | 0-shot · CoT | 06 Dec 2024 | Self-reported | Primary |
| 5 | Nova Pro | 76.6% | 0-shot · CoT | 03 Dec 2024 | Self-reported | Primary |
| 6 | GPT-4o | 76.6% | — | 16 Apr 2025 | Self-reported | Primary |
| 7 | Nova Lite | 73.3% | 0-shot · CoT | 03 Dec 2024 | Self-reported | Primary |
| 8 | Nova Micro | 69.3% | 0-shot · CoT | 03 Dec 2024 | Self-reported | Primary |
| 9 | Claude Haiku 3.5 | 69.2% | 0-shot · CoT | 22 Oct 2024 | Self-reported | Primary |
| 10 | Claude Opus 3 | 60.1% | 0-shot · CoT | 22 Oct 2024 | Self-reported | Primary |
| 11 | Pixtral 12B | 48.1% | Maj@1 | 10 Oct 2024 | Self-reported | Primary |
| 12 | Llama 3.2 | 48.0% | 0-shot · CoT | 25 Oct 2024 | Self-reported | Primary |
| 13 | Mixtral 8x22B | 28.4% | — | 08 Jan 2024 | Paper | Primary |
| 14 | Mixtral 8x7B | 28.4% | — | 01 Dec 2023 | Self-reported | Primary |
| 15 | Gemma 2 | 15.0% | — | 25 Feb 2025 | Self-reported | Primary |
| 16 | Gemma 2 | 15.0% | 4-shot | 25 Feb 2025 | Self-reported | Primary |
| 17 | Mistral 7B | 13.1% | — | 01 Sep 2023 | Self-reported | Primary |
