TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

SimpleQA Verified

SimpleQA Verified (Epoch curation)

Short-form factual questions with single, unambiguous answers. Tests world knowledge and (critically) hallucination — refusing or hedging counts as not-correct.

Knowledge Text Accuracy Max 100.0% Released Oct 2024
2
Results
2
Models scored
72.1%
Top: Gemini 3 Pro (Thinking)
41.4%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Sep 2025Oct 2025Nov 2025

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Gemini 3 Pro (Thinking) 72.1% 0-shot · CoT 18 Nov 2025 Self-reported Primary
2 Gemini 2.5 Flash-Lite 10.7% 26 Sep 2025 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task