TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

GPQA Diamond

Graduate-Level Google-Proof Q&A — Diamond subset

PhD-level multiple-choice questions in biology, physics, and chemistry, written by domain experts so non-experts cannot answer them even with web search. Diamond is the hardest curated subset.

Knowledge Text accuracy Max 100.0% Released Nov 2023
82
Results
79
Models scored
94.4%
Top: GPT 5.4 Pro
80.0%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Jan 2024Apr 2025Jul 2026

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 GPT 5.4 Pro 94.4% CoT Mar 5, 2026 self reported primary
2 Gemini 3.1 Pro 94.3% CoT Feb 19, 2026 self reported primary
3 Claude Opus 4.7 94.2% Apr 16, 2026 self reported primary
4 Gemini 3 Deep Think 93.8% CoT Feb 12, 2026 self reported primary
5 GPT 5.5 93.6% CoT Apr 23, 2026 self reported primary
6 GPT 5.2 Pro 93.2% CoT Dec 11, 2025 self reported primary
7 GPT 5.4 92.8% CoT Mar 5, 2026 self reported primary
8 GPT 5.3 Codex 92.6% Mar 5, 2026 self reported primary
9 GPT 5.2 Thinking 92.4% CoT Dec 11, 2025 self reported primary
10 Qwen 3.7 Max 92.4% 0-shot · CoT · standard May 20, 2026 self reported
11 Gemini 3 Pro 91.9% CoT Nov 18, 2025 self reported primary
12 Claude Opus 4.6 91.3% Feb 5, 2026 self reported primary
13 Kimi K2.6 90.5% CoT Apr 20, 2026 self reported primary
14 Gemini 3 Flash 90.4% CoT Dec 17, 2025 self reported primary
15 Gemini 3 Flash (Thinking) 90.4% Dec 17, 2025 self reported primary
16 Claude Sonnet 4.6 89.9% Feb 17, 2026 self reported primary
17 Muse Spark 89.5% Apr 8, 2026 self reported primary
18 Grok 4 Heavy 88.4% CoT Jul 9, 2025 self reported primary
19 GPT 5.1 88.1% Nov 13, 2025 self reported primary
20 GPT 5.1 Thinking 88.1% CoT Nov 12, 2025 self reported primary
21 GPT 5.4 Mini 88.0% CoT Mar 17, 2026 self reported primary
22 Grok 4 87.5% CoT Jul 9, 2025 self reported primary
23 Claude Opus 4.5 87.0% Nov 24, 2025 self reported primary
24 Qwen 3.5 122B A10B 86.6% Apr 24, 2026 third party primary verified
25 Gemini 2.5 Pro (Thinking) 86.4% Dec 17, 2025 self reported primary
26 GLM-5.1 86.2% CoT Apr 8, 2026 self reported primary
27 GLM 5 86.0% CoT Feb 12, 2026 self reported primary
28 GPT 5 (Thinking) 85.7% Aug 7, 2025 self reported primary
29 Qwen 3.5 27B 85.5% Feb 24, 2026 third party primary verified
30 Grok 3 Think 84.6% CoT Feb 19, 2025 self reported primary
31 Gemma 4 84.3% CoT Apr 3, 2026 self reported primary
32 Qwen 3.5 35B A3B 84.2% Feb 15, 2025 third party primary verified
33 Gemini 2.5 Pro 84.0% CoT Mar 25, 2025 self reported primary
34 Claude Sonnet 4.5 83.4% CoT Sep 29, 2025 self reported primary
35 o3 83.3% Apr 16, 2025 self reported primary
36 GPT 5.4 Nano 82.8% CoT Mar 17, 2026 self reported primary
37 Gemini 2.5 Flash (Thinking) 82.8% Dec 17, 2025 self reported primary
38 Deepseek 3.2 82.4% Dec 1, 2025 paper primary
39 GLM 4.6 81.0% CoT Sep 30, 2025 self reported primary
40 Opus 4.1 Thinking 80.9% CoT Aug 5, 2025 self reported primary
41 DeepSeek V3.1 Terminus 80.7% Sep 22, 2025 self reported primary
42 GPT OSS 120B 80.1% CoT Aug 5, 2025 self reported primary
43 DeepSeek V3.2 Exp 79.9% CoT Sep 29, 2025 self reported primary
44 Claude Sonnet 3.7 (Thinking) 78.2% Feb 24, 2025 self reported primary
45 o1 78.0% Apr 16, 2025 self reported primary
46 GPT 5 77.8% Aug 7, 2025 self reported primary
47 Llama 3.1 Nemotron Ultra 76.0% Apr 8, 2025 self reported primary
48 Claude Sonnet 4 75.4% May 22, 2025 self reported primary
49 Grok 3 75.4% Feb 19, 2025 self reported primary
50 Grok 3 75.4% Feb 19, 2025 self reported primary
51 Kimi K2 Instruct 75.1% Jul 2, 2025 paper primary
52 Nemotron 3 Nano 75.0% Dec 15, 2025 self reported primary
53 Llama 4 Behemoth 73.7% Apr 5, 2025 self reported primary
54 Claude Haiku 4.5 73.0% Oct 15, 2025 self reported primary
55 Claude Haiku 4.5 73.0% Oct 15, 2025 self reported primary
56 Gemma 3 72.6% May 20, 2025 self reported primary
57 DeepSeek-R1 71.5% CoT Jan 21, 2025 paper primary
58 R1 1776 71.5% Feb 18, 2025 self reported primary
59 Magistral Medium 70.8% CoT Jun 10, 2025 self reported primary
60 Llama 4 Maverick 69.8% Apr 5, 2025 self reported primary
61 Phi 4 reasoning plus 69.3% Jul 8, 2026 self reported primary
62 GPT 4.1 66.3% Apr 14, 2025 self reported primary
63 Grok 3 mini 66.2% Feb 19, 2025 self reported primary
64 Qwen3-30B-A3B 65.8% CoT Apr 28, 2025 self reported primary
65 Qwen3 30B A3B 65.8% Apr 28, 2025 self reported primary
66 Claude Haiku 3.5 65.0% 0-shot · CoT Oct 22, 2024 self reported primary
67 Seed 1.5 65.0% 0-shot · CoT Jan 22, 2025 self reported primary
68 Gemini 2.5 Flash-Lite 64.6% Sep 26, 2025 self reported primary
69 Claude Sonnet 3.7 62.3% Feb 24, 2025 self reported primary
70 Gemini 2.0 Flash 60.1% 0-shot · CoT · standard Feb 5, 2025 self reported
71 Nemotron 3 Super 60.0% 5-shot · CoT Apr 3, 2026 self reported primary
72 Claude Sonnet 3.5 59.4% 0-shot · CoT · standard Jun 20, 2024 self reported
73 DeepSeek V3 59.1% Dec 26, 2024 paper primary
74 Llama 4 Scout 57.2% Apr 5, 2025 self reported primary
75 GPT-4o 53.6% Apr 16, 2025 self reported primary
76 Command A 50.8% Apr 7, 2025 paper primary
77 Command A 50.8% Apr 7, 2025 self reported primary
78 Llama 3.3 50.5% 0-shot · CoT Dec 6, 2025 self reported primary
79 GPT-4 Turbo 50.4% Jan 1, 2024 paper primary
80 Claude Opus 3 50.4% Mar 4, 2024 self reported primary
81 Nova Pro 46.9% 0-shot · CoT Dec 3, 2024 self reported primary
82 Mistral Large 3 43.9% 5-shot Dec 2, 2025 self reported primary
83 Nova Lite 42.0% 0-shot · CoT Dec 3, 2024 self reported primary
84 Nova Micro 40.0% 0-shot · CoT Dec 3, 2024 self reported primary
85 Claude Haiku 3 33.3% 0-shot · CoT · standard Mar 4, 2024 self reported
86 Llama 3.2 32.8% 0-shot Oct 25, 2024 self reported primary
0 AIs selected
Clear selection
#
Name
Task