SimpleQA Verified
Short-form factual questions with single, unambiguous answers. Tests world knowledge and (critically) hallucination — refusing or hedging counts as not-correct.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Gemini 3 Pro (Thinking) | 72.1% | 0-shot · CoT | 18 Nov 2025 | Self-reported | Primary |
| 2 | Gemini 2.5 Flash-Lite | 10.7% | — | 26 Sep 2025 | Self-reported | Primary |
