MMLU
Multiple-choice questions across 57 academic subjects (humanities, STEM, social sciences, professional). Standard 5-shot accuracy. Largely saturated by frontier models.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | GPT 4.1 | 90.2% | — | Apr 14, 2025 | self reported | primary |
| 2 | GPT-4o | 88.7% | — | Apr 16, 2025 | self reported | primary |
| 3 | Seed 1.5 | 88.6% | — | Jan 22, 2025 | self reported | primary |
| 4 | Nova Premier | 87.4% | — | Apr 30, 2025 | self reported | primary |
| 5 | Claude Opus 3 | 86.8% | — | Mar 4, 2024 | self reported | primary |
| 6 | GPT-4 | 86.4% | 5-shot | Jan 1, 2024 | paper | primary |
| 7 | Nemotron 3 Super | 86.0% | 5-shot | Apr 3, 2026 | self reported | primary |
| 8 | Llama 3.3 | 86.0% | 0-shot · CoT | Dec 6, 2024 | self reported | primary |
| 9 | Nova Pro | 85.9% | 0-shot · CoT | Dec 3, 2024 | self reported | primary |
| 10 | Command A | 85.5% | — | Apr 7, 2025 | self reported | primary |
| 11 | Nova Lite | 80.5% | 0-shot · CoT | Dec 3, 2024 | self reported | primary |
| 12 | Nova Micro | 77.6% | 0-shot · CoT | Dec 3, 2024 | self reported | primary |
| 13 | Command R Plus | 75.7% | — | Apr 4, 2024 | self reported | primary |
| 14 | DBRX Instruct | 73.7% | 5-shot | Mar 27, 2024 | self reported | primary |
| 15 | Mixtral 8x7B | 70.6% | — | Dec 1, 2023 | paper | primary |
| 16 | Mixtral 8x22B | 70.6% | — | Jan 8, 2024 | paper | primary |
| 17 | Pixtral 12B | 69.2% | 5-shot | Oct 10, 2024 | self reported | primary |
| 18 | LLaMA 2 | 68.9% | 5-shot | Jul 19, 2023 | paper | primary verified |
| 19 | Mistral NeMo | 68.0% | 5-shot | Jul 18, 2024 | self reported | primary |
| 20 | Llama 3.2 | 63.4% | — | Sep 25, 2024 | self reported | primary |
| 21 | Mistral 7B | 60.1% | — | Sep 1, 2023 | paper | primary |
| 22 | Gemma 2 | 51.3% | 5-shot | Feb 25, 2025 | self reported | primary |
