TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

IFEval

Instruction-Following Eval

Verifiable instruction-following: ~25 instruction types whose compliance can be checked deterministically (e.g. word counts, formats).

Language Text accuracy Max 100.0% Released Nov 2023
12
Results
12
Models scored
93.2%
Top: Claude Sonnet 3.7 (Thinking)
89.6%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Dec 2024Jul 2025Feb 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Claude Sonnet 3.7 (Thinking) 93.2% Feb 24, 2025 self reported primary
2 Nova Pro 92.1% 0-shot Dec 3, 2024 self reported primary
3 Llama 3.3 92.1% Dec 6, 2024 self reported primary
4 Command A 90.9% Apr 7, 2025 self reported primary
5 Claude Sonnet 3.7 90.8% Feb 24, 2025 self reported primary
6 Nova Lite 89.7% 0-shot Dec 3, 2024 self reported primary
7 Seed 1.5 89.5% 0-shot · CoT Jan 22, 2025 self reported primary
8 Nova Micro 87.2% 0-shot Dec 3, 2024 self reported primary
9 GPT 4.1 87.0% Apr 14, 2025 self reported primary
10 Mistral Small 3 82.9% Jan 30, 2025 self reported primary
11 Llama 3.2 77.4% Sep 25, 2025 self reported primary
12 Qwen 3.5 27B 76.5% Feb 24, 2026 third party primary verified
0 AIs selected
Clear selection
#
Name
Task