ARC Challenge
Grade-school science multiple-choice, hard subset. Saturated by frontier models but still in many evaluation harnesses.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Gemma 2 | 554.0% | — | Feb 25, 2025 | self reported | primary |
| 2 | Claude Opus 3 | 96.4% | 25-shot | Oct 22, 2024 | self reported | primary |
| 3 | Nemotron 3 Super | 96.1% | 25-shot | Apr 3, 2026 | self reported | primary |
| 4 | Nova Pro | 94.8% | 0-shot | Dec 3, 2024 | self reported | primary |
| 5 | Nova Lite | 92.4% | 0-shot | Dec 3, 2024 | self reported | primary |
| 6 | Claude 2 | 91.0% | 5-shot · standard | Jul 11, 2023 | self reported | |
| 7 | Nova Micro | 90.2% | 0-shot | Dec 3, 2024 | self reported | primary |
| 8 | Claude Haiku 3 | 89.2% | 25-shot · standard | Mar 4, 2024 | self reported | |
| 9 | GPT 3.5 | 85.2% | 25-shot · standard | Mar 14, 2023 | self reported | |
| 10 | Llama 3.2 | 78.6% | 0-shot | Oct 22, 2024 | self reported | primary |
| 11 | Mixtral 8x7B | 59.7% | — | Dec 1, 2023 | self reported | primary |
| 12 | Mixtral 8x7B | 59.7% | — | Jan 8, 2024 | self reported | primary |
| 13 | Mistral 7B | 55.6% | — | Sep 1, 2023 | self reported | primary |
