TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

IFBench

Instruction Following Benchmark

Measures how reliably a model follows complex multi-constraint instructions, a known weak spot for many otherwise strong models.

Language Text accuracy Max 100.0% Released Jun 2025
2
Results
2
Models scored
76.1%
Top: Qwen 3.5 122B A10B
73.2%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Feb 2025Sep 2025Apr 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Qwen 3.5 122B A10B 76.1% Apr 24, 2026 third party primary verified
2 Qwen 3.5 35B A3B 70.2% Feb 15, 2025 third party primary verified
0 AIs selected
Clear selection
#
Name
Task