IFEval

#	Model	Score	Conditions	Eval date	Source	Flags
1	Qwen 3.7 Max	94.3%	0-shot · CoT · standard	20 May 2026	Self-reported
2	Claude Sonnet 3.7 (Thinking)	93.2%	—	24 Feb 2025	Self-reported	Primary
3	Nova Pro	92.1%	0-shot	03 Dec 2024	Self-reported	Primary
4	Llama 3.3	92.1%	—	06 Dec 2024	Self-reported	Primary
5	Command A	90.9%	—	07 Apr 2025	Self-reported	Primary
6	Claude Sonnet 3.7	90.8%	—	24 Feb 2025	Self-reported	Primary
7	Nova Lite	89.7%	0-shot	03 Dec 2024	Self-reported	Primary
8	Seed 1.5	89.5%	0-shot · CoT	22 Jan 2025	Self-reported	Primary
9	Claude Sonnet 3.5	87.8%	0-shot · standard	22 Oct 2024	Self-reported
10	Nova Micro	87.2%	0-shot	03 Dec 2024	Self-reported	Primary
11	GPT 4.1	87.0%	—	14 Apr 2025	Self-reported	Primary
12	Mistral Small 3	82.9%	—	30 Jan 2025	Self-reported	Primary
13	Llama 3.2	77.4%	—	25 Sep 2025	Self-reported	Primary
14	Claude Haiku 3	77.2%	0-shot · standard	22 Oct 2024	Self-reported
15	Qwen 3.5 27B	76.5%	—	24 Feb 2026	Third-party	Primary Verified

Go to section