SimpleQA Verified
Short-form factual questions with single, unambiguous answers. Tests world knowledge and (critically) hallucination — refusing or hedging counts as not-correct.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Gemini 3 Pro (Thinking) | 72.1% | 0-shot · CoT | Nov 18, 2025 | self reported | primary |
| 2 | Gemini 2.5 Flash-Lite | 10.7% | — | Sep 26, 2025 | self reported | primary |
