MATH-500
500-question subset of MATH popularised by OpenAI's o-series releases. Reported widely as the standard 'MATH' number on modern leaderboards.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | DeepSeek-R1 | 97.3% | CoT | 21 Jan 2025 | Paper | Primary |
| 2 | Claude Sonnet 3.7 (Thinking) | 96.2% | — | 24 Feb 2025 | Self-reported | Primary |
| 3 | Llama 4 Behemoth | 95.0% | — | 05 Apr 2025 | Self-reported | Primary |
| 4 | DeepSeek V3 | 90.2% | — | 26 Dec 2024 | Paper | Primary |
| 5 | Claude Sonnet 3.7 | 82.2% | — | 24 Feb 2025 | Self-reported | Primary |
| 6 | Nova Premier | 82.0% | — | 30 Apr 2025 | Self-reported | Primary |
