TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

SWE-bench Verified

500 manually validated GitHub issues from popular Python repos. Models must produce a patch that passes the hidden test suite. The current standard for "real software engineering" capability.

Coding Text Accuracy Max 100.0% Released Aug 2024
49
Results
48
Models scored
87.6%
Top: Claude Opus 4.7
72.2%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Oct 2024Jul 2025Apr 2026

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 Claude Opus 4.7 87.6% CoT 16 Apr 2026 Self-reported Primary
2 Claude Opus 4.5 80.9% 24 Nov 2025 Self-reported Primary
3 Claude Opus 4.6 80.8% 05 Feb 2026 Self-reported Primary
4 Gemini 3.1 Pro 80.6% CoT 19 Feb 2026 Self-reported Primary
5 Qwen 3.7 Max 80.4% 0-shot · CoT · agentic 20 May 2026 Self-reported
6 MiniMax M2.5 80.2% 0-shot · CoT · agentic, avg@4 12 Feb 2026 Self-reported Primary
7 Kimi K2.6 80.2% CoT 20 Apr 2026 Self-reported Primary
8 GPT 5.2 Thinking 80.0% CoT 11 Dec 2025 Self-reported Primary
9 Claude Sonnet 4.6 79.6% 17 Feb 2026 Self-reported Primary
10 Gemini 3 Flash 78.0% CoT 17 Dec 2025 Self-reported Primary
11 Gemini 3 Flash (Thinking) 78.0% 17 Dec 2025 Self-reported Primary
12 GLM 5 77.8% CoT 12 Feb 2026 Self-reported Primary
13 Mistral Medium 3.5 77.6% 27 Apr 2026 Self-reported Primary
14 Muse Spark 77.4% CoT 08 Apr 2026 Self-reported Primary
15 Claude Sonnet 4.5 77.2% CoT 29 Sep 2025 Self-reported Primary
16 Kimi K2.5 76.8% CoT 27 Jan 2026 Self-reported Primary
17 Gemini 3 Pro 76.2% CoT 18 Nov 2025 Self-reported Primary
18 GPT 5.1 74.9% 0-shot · CoT 13 Nov 2025 Self-reported Primary
19 GPT 5 (Thinking) 74.9% 07 Aug 2025 Self-reported Primary
20 Opus 4.1 Thinking 74.5% CoT 05 Aug 2025 Self-reported Primary
21 Claude Haiku 4.5 73.3% 0-shot · CoT 15 Oct 2025 Self-reported Primary
22 Claude Haiku 4.5 73.3% 15 Oct 2025 Self-reported Primary
23 Deepseek 3.2 73.1% 01 Dec 2025 Paper Primary Verified
24 Claude Sonnet 4 72.7% 22 May 2025 Self-reported Primary
25 Qwen 3.5 27B 72.4% 24 Feb 2026 Third-party Primary Verified
26 Devstral 2 72.2% 0-shot 09 Dec 2025 Self-reported Primary
27 Qwen 3.5 122B A10B 72.0% 24 Feb 2026 Third-party Primary Verified
28 Grok Code Fast 1 70.8% CoT 09 Jul 2025 Self-reported Primary
29 Grok Build 0.1 70.8% 0-shot · CoT · agentic 14 May 2026 Self-reported
30 Qwen 3.5 35B A3B 69.2% 24 Feb 2026 Third-party Primary Verified
31 o3 69.1% 16 Apr 2025 Self-reported Primary
32 o4 mini 68.1% 16 Apr 2025 Self-reported Primary
33 GLM 4.6 68.0% CoT 30 Sep 2025 Self-reported Primary
34 DeepSeek V3.2 Exp 67.8% CoT 29 Sep 2025 Self-reported Primary
35 Qwen3 Coder 67.0% 22 Jul 2025 Self-reported Primary
36 Kimi K2 Instruct 65.8% 20 Jul 2025 Paper Primary
37 Gemini 2.5 Pro 63.8% CoT 25 Mar 2025 Self-reported Primary
38 Claude Sonnet 3.7 (Thinking) 62.3% 24 Feb 2025 Self-reported Primary
39 Claude Sonnet 3.7 62.3% 24 Feb 2025 Self-reported Primary
40 Gemini 2.5 Flash (Thinking) 60.4% 17 Dec 2025 Self-reported Primary
41 Gemini 2.5 Pro (Thinking) 59.6% 17 Dec 2025 Self-reported Primary
42 GPT 5.4 57.7% 05 Mar 2026 Self-reported Primary
43 GPT 5.3 Codex 56.8% 05 Mar 2026 Self-reported Primary
44 GPT 4.1 55.0% 14 Apr 2025 Self-reported Primary
45 GPT 5 52.8% 07 Aug 2025 Self-reported Primary
46 DeepSeek-R1 49.2% CoT 21 Jan 2025 Paper Primary
47 o1 48.9% 16 Apr 2025 Self-reported Primary
48 Nova Premier 42.4% 30 Apr 2025 Self-reported Primary
49 DeepSeek V3 42.0% 26 Dec 2024 Paper Primary
50 Claude Haiku 3.5 40.6% 22 Oct 2024 Self-reported Primary
51 Claude Sonnet 3.5 33.4% 0-shot · agentic 22 Oct 2024 Self-reported
52 Gemini 2.5 Flash-Lite 31.6% 26 Sep 2025 Self-reported Primary
53 Claude Haiku 3 7.20% 0-shot · agentic 22 Oct 2024 Self-reported
0 AIs selected
Clear selection
#
Name
Task