AIME 2024
30 problems from AIME I and II 2024. Standard high-school competition math eval before AIME 2025 superseded it as primary signal.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | o4 mini | 93.4% | — | 16 Apr 2025 | Self-reported | Primary |
| 2 | o3 | 91.6% | — | 16 Apr 2025 | Self-reported | Primary |
| 3 | Qwen3 235B A22B | 85.7% | — | 28 Apr 2025 | Self-reported | Primary |
| 4 | Phi 4 reasoning plus | 81.3% | CoT | 08 Jul 2025 | Self-reported | Primary |
| 5 | Qwen3-30B-A3B | 80.4% | — | 28 Apr 2025 | Paper | Primary |
| 6 | Qwen3 30B A3B | 80.4% | — | 28 Apr 2025 | Self-reported | Primary |
| 7 | DeepSeek-R1 | 79.8% | CoT | 21 Jan 2025 | Paper | Primary |
| 8 | o1 | 74.3% | — | 16 Apr 2025 | Self-reported | Primary |
| 9 | Magistral Medium | 73.6% | CoT | 10 Jun 2025 | Self-reported | Primary |
| 10 | Kimi K2 | 69.6% | — | 11 Jul 2025 | Self-reported | Primary Verified |
| 11 | Claude Sonnet 3.7 (Thinking) | 61.3% | — | 24 Feb 2025 | Self-reported | Primary |
| 12 | Nemotron 3 Super | 53.3% | pass@32 | 03 Apr 2026 | Self-reported | Primary |
| 13 | Grok 3 | 52.2% | — | 19 Feb 2025 | Self-reported | Primary |
| 14 | Grok 3 | 52.2% | — | 19 Feb 2025 | Self-reported | Primary |
| 15 | GPT 4.1 | 48.1% | — | 14 Apr 2025 | Self-reported | Primary |
| 16 | Grok 3 mini | 39.7% | — | 19 Feb 2025 | Self-reported | Primary |
| 17 | DeepSeek V3 | 39.2% | — | 26 Dec 2024 | Paper | Primary |
| 18 | Claude Sonnet 3.7 | 23.3% | — | 24 Feb 2025 | Self-reported | Primary |
| 19 | Claude Haiku 3.5 | 5.30% | 0-shot · CoT | 22 Oct 2024 | Self-reported | Primary |
| 20 | Claude Haiku 3 | 0.80% | 0-shot · CoT · standard | 22 Oct 2024 | Self-reported |
MongoDB - Build AI That Scales
