GPQA Diamond
PhD-level multiple-choice questions in biology, physics, and chemistry, written by domain experts so non-experts cannot answer them even with web search. Diamond is the hardest curated subset.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | GPT 5.4 Pro | 94.4% | CoT | Mar 5, 2026 | self reported | primary |
| 2 | Gemini 3.1 Pro | 94.3% | CoT | Feb 19, 2026 | self reported | primary |
| 3 | Claude Opus 4.7 | 94.2% | — | Apr 16, 2026 | self reported | primary |
| 4 | Gemini 3 Deep Think | 93.8% | CoT | Feb 12, 2026 | self reported | primary |
| 5 | GPT 5.5 | 93.6% | CoT | Apr 23, 2026 | self reported | primary |
| 6 | GPT 5.2 Pro | 93.2% | CoT | Dec 11, 2025 | self reported | primary |
| 7 | GPT 5.4 | 92.8% | CoT | Mar 5, 2026 | self reported | primary |
| 8 | GPT 5.3 Codex | 92.6% | — | Mar 5, 2026 | self reported | primary |
| 9 | GPT 5.2 Thinking | 92.4% | CoT | Dec 11, 2025 | self reported | primary |
| 10 | Gemini 3 Pro | 91.9% | CoT | Nov 18, 2025 | self reported | primary |
| 11 | Claude Opus 4.6 | 91.3% | — | Feb 5, 2026 | self reported | primary |
| 12 | Kimi K2.6 | 90.5% | CoT | Apr 20, 2026 | self reported | primary |
| 13 | Gemini 3 Flash | 90.4% | CoT | Dec 17, 2025 | self reported | primary |
| 14 | Gemini 3 Flash (Thinking) | 90.4% | — | Dec 17, 2025 | self reported | primary |
| 15 | Claude Sonnet 4.6 | 89.9% | — | Feb 17, 2026 | self reported | primary |
| 16 | Muse Spark | 89.5% | — | Apr 8, 2026 | self reported | primary |
| 17 | Grok 4 Heavy | 88.4% | CoT | Jul 9, 2025 | self reported | primary |
| 18 | GPT 5.1 | 88.1% | — | Nov 13, 2025 | self reported | primary |
| 19 | GPT 5.1 Thinking | 88.1% | CoT | Nov 12, 2025 | self reported | primary |
| 20 | GPT 5.4 Mini | 88.0% | CoT | Mar 17, 2026 | self reported | primary |
| 21 | Grok 4 | 87.5% | CoT | Jul 9, 2025 | self reported | primary |
| 22 | Claude Opus 4.5 | 87.0% | — | Nov 24, 2025 | self reported | primary |
| 23 | Qwen 3.5 122B A10B | 86.6% | — | Apr 24, 2026 | third party | primary verified |
| 24 | Gemini 2.5 Pro (Thinking) | 86.4% | — | Dec 17, 2025 | self reported | primary |
| 25 | GLM-5.1 | 86.2% | CoT | Apr 8, 2026 | self reported | primary |
| 26 | GLM 5 | 86.0% | CoT | Feb 12, 2026 | self reported | primary |
| 27 | GPT 5 (Thinking) | 85.7% | — | Aug 7, 2025 | self reported | primary |
| 28 | Qwen 3.5 27B | 85.5% | — | Feb 24, 2026 | third party | primary verified |
| 29 | Grok 3 Think | 84.6% | CoT | Feb 19, 2025 | self reported | primary |
| 30 | Gemma 4 | 84.3% | CoT | Apr 3, 2026 | self reported | primary |
| 31 | Qwen 3.5 35B A3B | 84.2% | — | Feb 15, 2025 | third party | primary verified |
| 32 | Gemini 2.5 Pro | 84.0% | CoT | Mar 25, 2025 | self reported | primary |
| 33 | Claude Sonnet 4.5 | 83.4% | CoT | Sep 29, 2025 | self reported | primary |
| 34 | o3 | 83.3% | — | Apr 16, 2025 | self reported | primary |
| 35 | GPT 5.4 Nano | 82.8% | CoT | Mar 17, 2026 | self reported | primary |
| 36 | Gemini 2.5 Flash (Thinking) | 82.8% | — | Dec 17, 2025 | self reported | primary |
| 37 | Deepseek 3.2 | 82.4% | — | Dec 1, 2025 | paper | primary |
| 38 | GLM 4.6 | 81.0% | CoT | Sep 30, 2025 | self reported | primary |
| 39 | Opus 4.1 Thinking | 80.9% | CoT | Aug 5, 2025 | self reported | primary |
| 40 | DeepSeek V3.1 Terminus | 80.7% | — | Sep 22, 2025 | self reported | primary |
| 41 | GPT OSS 120B | 80.1% | CoT | Aug 5, 2025 | self reported | primary |
| 42 | DeepSeek V3.2 Exp | 79.9% | CoT | Sep 29, 2025 | self reported | primary |
| 43 | Claude Sonnet 3.7 (Thinking) | 78.2% | — | Feb 24, 2025 | self reported | primary |
| 44 | o1 | 78.0% | — | Apr 16, 2025 | self reported | primary |
| 45 | GPT 5 | 77.8% | — | Aug 7, 2025 | self reported | primary |
| 46 | Llama 3.1 Nemotron Ultra | 76.0% | — | Apr 8, 2025 | self reported | primary |
| 47 | Claude Sonnet 4 | 75.4% | — | May 22, 2025 | self reported | primary |
| 48 | Grok 3 | 75.4% | — | Feb 19, 2025 | self reported | primary |
| 49 | Grok 3 | 75.4% | — | Feb 19, 2025 | self reported | primary |
| 50 | Kimi K2 Instruct | 75.1% | — | Jul 2, 2025 | paper | primary |
| 51 | Nemotron 3 Nano | 75.0% | — | Dec 15, 2025 | self reported | primary |
| 52 | Llama 4 Behemoth | 73.7% | — | Apr 5, 2025 | self reported | primary |
| 53 | Claude Haiku 4.5 | 73.0% | — | Oct 15, 2025 | self reported | primary |
| 54 | Claude Haiku 4.5 | 73.0% | — | Oct 15, 2025 | self reported | primary |
| 55 | Gemma 3 | 72.6% | — | May 20, 2025 | self reported | primary |
| 56 | DeepSeek-R1 | 71.5% | CoT | Jan 21, 2025 | paper | primary |
| 57 | R1 1776 | 71.5% | — | Feb 18, 2025 | self reported | primary |
| 58 | Magistral Medium | 70.8% | CoT | Jun 10, 2025 | self reported | primary |
| 59 | Llama 4 Maverick | 69.8% | — | Apr 5, 2025 | self reported | primary |
| 60 | Phi 4 reasoning plus | 69.3% | — | Jul 8, 2026 | self reported | primary |
| 61 | GPT 4.1 | 66.3% | — | Apr 14, 2025 | self reported | primary |
| 62 | Grok 3 mini | 66.2% | — | Feb 19, 2025 | self reported | primary |
| 63 | Qwen3-30B-A3B | 65.8% | CoT | Apr 28, 2025 | self reported | primary |
| 64 | Qwen3 30B A3B | 65.8% | — | Apr 28, 2025 | self reported | primary |
| 65 | Claude Haiku 3.5 | 65.0% | 0-shot · CoT | Oct 22, 2024 | self reported | primary |
| 66 | Seed 1.5 | 65.0% | 0-shot · CoT | Jan 22, 2025 | self reported | primary |
| 67 | Gemini 2.5 Flash-Lite | 64.6% | — | Sep 26, 2025 | self reported | primary |
| 68 | Claude Sonnet 3.7 | 62.3% | — | Feb 24, 2025 | self reported | primary |
| 69 | Nemotron 3 Super | 60.0% | 5-shot · CoT | Apr 3, 2026 | self reported | primary |
| 70 | DeepSeek V3 | 59.1% | — | Dec 26, 2024 | paper | primary |
| 71 | Llama 4 Scout | 57.2% | — | Apr 5, 2025 | self reported | primary |
| 72 | GPT-4o | 53.6% | — | Apr 16, 2025 | self reported | primary |
| 73 | Command A | 50.8% | — | Apr 7, 2025 | paper | primary |
| 74 | Command A | 50.8% | — | Apr 7, 2025 | self reported | primary |
| 75 | Llama 3.3 | 50.5% | 0-shot · CoT | Dec 6, 2025 | self reported | primary |
| 76 | GPT-4 Turbo | 50.4% | — | Jan 1, 2024 | paper | primary |
| 77 | Claude Opus 3 | 50.4% | — | Mar 4, 2024 | self reported | primary |
| 78 | Nova Pro | 46.9% | 0-shot · CoT | Dec 3, 2024 | self reported | primary |
| 79 | Mistral Large 3 | 43.9% | 5-shot | Dec 2, 2025 | self reported | primary |
| 80 | Nova Lite | 42.0% | 0-shot · CoT | Dec 3, 2024 | self reported | primary |
| 81 | Nova Micro | 40.0% | 0-shot · CoT | Dec 3, 2024 | self reported | primary |
| 82 | Llama 3.2 | 32.8% | 0-shot | Oct 25, 2024 | self reported | primary |
