TruthfulQA
817 questions designed to elicit imitative falsehoods. Measures whether models repeat common misconceptions.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Mistral NeMo | 50.3% | 0-shot | Jul 18, 2024 | self reported | primary |
