MATH-500
500-question subset of MATH popularised by OpenAI's o-series releases. Reported widely as the standard 'MATH' number on modern leaderboards.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | DeepSeek-R1 | 97.3% | CoT | Jan 21, 2025 | paper | primary |
| 2 | Claude Sonnet 3.7 (Thinking) | 96.2% | — | Feb 24, 2025 | self reported | primary |
| 3 | Llama 4 Behemoth | 95.0% | — | Apr 5, 2025 | self reported | primary |
| 4 | DeepSeek V3 | 90.2% | — | Dec 26, 2024 | paper | primary |
| 5 | Claude Sonnet 3.7 | 82.2% | — | Feb 24, 2025 | self reported | primary |
| 6 | Nova Premier | 82.0% | — | Apr 30, 2025 | self reported | primary |
