TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

GPQA Diamond

Graduate-Level Google-Proof Q&A — Diamond subset

PhD-level multiple-choice questions in biology, physics, and chemistry, written by domain experts so non-experts cannot answer them even with web search. Diamond is the hardest curated subset.

Knowledge Text accuracy Max 100.0% Released Nov 2023
82
Results
79
Models scored
94.4%
Top: GPT 5.4 Pro
80.0%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Jan 2024Apr 2025Jul 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 GPT 5.4 Pro 94.4% CoT Mar 5, 2026 self reported primary
2 Gemini 3.1 Pro 94.3% CoT Feb 19, 2026 self reported primary
3 Claude Opus 4.7 94.2% Apr 16, 2026 self reported primary
4 Gemini 3 Deep Think 93.8% CoT Feb 12, 2026 self reported primary
5 GPT 5.5 93.6% CoT Apr 23, 2026 self reported primary
6 GPT 5.2 Pro 93.2% CoT Dec 11, 2025 self reported primary
7 GPT 5.4 92.8% CoT Mar 5, 2026 self reported primary
8 GPT 5.3 Codex 92.6% Mar 5, 2026 self reported primary
9 GPT 5.2 Thinking 92.4% CoT Dec 11, 2025 self reported primary
10 Gemini 3 Pro 91.9% CoT Nov 18, 2025 self reported primary
11 Claude Opus 4.6 91.3% Feb 5, 2026 self reported primary
12 Kimi K2.6 90.5% CoT Apr 20, 2026 self reported primary
13 Gemini 3 Flash 90.4% CoT Dec 17, 2025 self reported primary
14 Gemini 3 Flash (Thinking) 90.4% Dec 17, 2025 self reported primary
15 Claude Sonnet 4.6 89.9% Feb 17, 2026 self reported primary
16 Muse Spark 89.5% Apr 8, 2026 self reported primary
17 Grok 4 Heavy 88.4% CoT Jul 9, 2025 self reported primary
18 GPT 5.1 88.1% Nov 13, 2025 self reported primary
19 GPT 5.1 Thinking 88.1% CoT Nov 12, 2025 self reported primary
20 GPT 5.4 Mini 88.0% CoT Mar 17, 2026 self reported primary
21 Grok 4 87.5% CoT Jul 9, 2025 self reported primary
22 Claude Opus 4.5 87.0% Nov 24, 2025 self reported primary
23 Qwen 3.5 122B A10B 86.6% Apr 24, 2026 third party primary verified
24 Gemini 2.5 Pro (Thinking) 86.4% Dec 17, 2025 self reported primary
25 GLM-5.1 86.2% CoT Apr 8, 2026 self reported primary
26 GLM 5 86.0% CoT Feb 12, 2026 self reported primary
27 GPT 5 (Thinking) 85.7% Aug 7, 2025 self reported primary
28 Qwen 3.5 27B 85.5% Feb 24, 2026 third party primary verified
29 Grok 3 Think 84.6% CoT Feb 19, 2025 self reported primary
30 Gemma 4 84.3% CoT Apr 3, 2026 self reported primary
31 Qwen 3.5 35B A3B 84.2% Feb 15, 2025 third party primary verified
32 Gemini 2.5 Pro 84.0% CoT Mar 25, 2025 self reported primary
33 Claude Sonnet 4.5 83.4% CoT Sep 29, 2025 self reported primary
34 o3 83.3% Apr 16, 2025 self reported primary
35 GPT 5.4 Nano 82.8% CoT Mar 17, 2026 self reported primary
36 Gemini 2.5 Flash (Thinking) 82.8% Dec 17, 2025 self reported primary
37 Deepseek 3.2 82.4% Dec 1, 2025 paper primary
38 GLM 4.6 81.0% CoT Sep 30, 2025 self reported primary
39 Opus 4.1 Thinking 80.9% CoT Aug 5, 2025 self reported primary
40 DeepSeek V3.1 Terminus 80.7% Sep 22, 2025 self reported primary
41 GPT OSS 120B 80.1% CoT Aug 5, 2025 self reported primary
42 DeepSeek V3.2 Exp 79.9% CoT Sep 29, 2025 self reported primary
43 Claude Sonnet 3.7 (Thinking) 78.2% Feb 24, 2025 self reported primary
44 o1 78.0% Apr 16, 2025 self reported primary
45 GPT 5 77.8% Aug 7, 2025 self reported primary
46 Llama 3.1 Nemotron Ultra 76.0% Apr 8, 2025 self reported primary
47 Claude Sonnet 4 75.4% May 22, 2025 self reported primary
48 Grok 3 75.4% Feb 19, 2025 self reported primary
49 Grok 3 75.4% Feb 19, 2025 self reported primary
50 Kimi K2 Instruct 75.1% Jul 2, 2025 paper primary
51 Nemotron 3 Nano 75.0% Dec 15, 2025 self reported primary
52 Llama 4 Behemoth 73.7% Apr 5, 2025 self reported primary
53 Claude Haiku 4.5 73.0% Oct 15, 2025 self reported primary
54 Claude Haiku 4.5 73.0% Oct 15, 2025 self reported primary
55 Gemma 3 72.6% May 20, 2025 self reported primary
56 DeepSeek-R1 71.5% CoT Jan 21, 2025 paper primary
57 R1 1776 71.5% Feb 18, 2025 self reported primary
58 Magistral Medium 70.8% CoT Jun 10, 2025 self reported primary
59 Llama 4 Maverick 69.8% Apr 5, 2025 self reported primary
60 Phi 4 reasoning plus 69.3% Jul 8, 2026 self reported primary
61 GPT 4.1 66.3% Apr 14, 2025 self reported primary
62 Grok 3 mini 66.2% Feb 19, 2025 self reported primary
63 Qwen3-30B-A3B 65.8% CoT Apr 28, 2025 self reported primary
64 Qwen3 30B A3B 65.8% Apr 28, 2025 self reported primary
65 Claude Haiku 3.5 65.0% 0-shot · CoT Oct 22, 2024 self reported primary
66 Seed 1.5 65.0% 0-shot · CoT Jan 22, 2025 self reported primary
67 Gemini 2.5 Flash-Lite 64.6% Sep 26, 2025 self reported primary
68 Claude Sonnet 3.7 62.3% Feb 24, 2025 self reported primary
69 Nemotron 3 Super 60.0% 5-shot · CoT Apr 3, 2026 self reported primary
70 DeepSeek V3 59.1% Dec 26, 2024 paper primary
71 Llama 4 Scout 57.2% Apr 5, 2025 self reported primary
72 GPT-4o 53.6% Apr 16, 2025 self reported primary
73 Command A 50.8% Apr 7, 2025 paper primary
74 Command A 50.8% Apr 7, 2025 self reported primary
75 Llama 3.3 50.5% 0-shot · CoT Dec 6, 2025 self reported primary
76 GPT-4 Turbo 50.4% Jan 1, 2024 paper primary
77 Claude Opus 3 50.4% Mar 4, 2024 self reported primary
78 Nova Pro 46.9% 0-shot · CoT Dec 3, 2024 self reported primary
79 Mistral Large 3 43.9% 5-shot Dec 2, 2025 self reported primary
80 Nova Lite 42.0% 0-shot · CoT Dec 3, 2024 self reported primary
81 Nova Micro 40.0% 0-shot · CoT Dec 3, 2024 self reported primary
82 Llama 3.2 32.8% 0-shot Oct 25, 2024 self reported primary
0 AIs selected
Clear selection
#
Name
Task