TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

TruthfulQA

817 questions designed to elicit imitative falsehoods. Measures whether models repeat common misconceptions.

Knowledge Text accuracy Max 100.0% Released Sep 2021
1
Results
1
Models scored
50.3%
Top: Mistral NeMo
50.3%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Not enough data to plot a trend yet.

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Mistral NeMo 50.3% 0-shot Jul 18, 2024 self reported primary
0 AIs selected
Clear selection
#
Name
Task