SWE-bench Verified
500 manually validated GitHub issues from popular Python repos. Models must produce a patch that passes the hidden test suite. The current standard for "real software engineering" capability.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 | 87.6% | CoT | 16 Apr 2026 | Self-reported | Primary |
| 2 | Claude Opus 4.5 | 80.9% | — | 24 Nov 2025 | Self-reported | Primary |
| 3 | Claude Opus 4.6 | 80.8% | — | 05 Feb 2026 | Self-reported | Primary |
| 4 | Gemini 3.1 Pro | 80.6% | CoT | 19 Feb 2026 | Self-reported | Primary |
| 5 | Qwen 3.7 Max | 80.4% | 0-shot · CoT · agentic | 20 May 2026 | Self-reported | |
| 6 | MiniMax M2.5 | 80.2% | 0-shot · CoT · agentic, avg@4 | 12 Feb 2026 | Self-reported | Primary |
| 7 | Kimi K2.6 | 80.2% | CoT | 20 Apr 2026 | Self-reported | Primary |
| 8 | GPT 5.2 Thinking | 80.0% | CoT | 11 Dec 2025 | Self-reported | Primary |
| 9 | Claude Sonnet 4.6 | 79.6% | — | 17 Feb 2026 | Self-reported | Primary |
| 10 | Gemini 3 Flash | 78.0% | CoT | 17 Dec 2025 | Self-reported | Primary |
| 11 | Gemini 3 Flash (Thinking) | 78.0% | — | 17 Dec 2025 | Self-reported | Primary |
| 12 | GLM 5 | 77.8% | CoT | 12 Feb 2026 | Self-reported | Primary |
| 13 | Mistral Medium 3.5 | 77.6% | — | 27 Apr 2026 | Self-reported | Primary |
| 14 | Muse Spark | 77.4% | CoT | 08 Apr 2026 | Self-reported | Primary |
| 15 | Claude Sonnet 4.5 | 77.2% | CoT | 29 Sep 2025 | Self-reported | Primary |
| 16 | Kimi K2.5 | 76.8% | CoT | 27 Jan 2026 | Self-reported | Primary |
| 17 | Gemini 3 Pro | 76.2% | CoT | 18 Nov 2025 | Self-reported | Primary |
| 18 | GPT 5.1 | 74.9% | 0-shot · CoT | 13 Nov 2025 | Self-reported | Primary |
| 19 | GPT 5 (Thinking) | 74.9% | — | 07 Aug 2025 | Self-reported | Primary |
| 20 | Opus 4.1 Thinking | 74.5% | CoT | 05 Aug 2025 | Self-reported | Primary |
| 21 | Claude Haiku 4.5 | 73.3% | 0-shot · CoT | 15 Oct 2025 | Self-reported | Primary |
| 22 | Claude Haiku 4.5 | 73.3% | — | 15 Oct 2025 | Self-reported | Primary |
| 23 | Deepseek 3.2 | 73.1% | — | 01 Dec 2025 | Paper | Primary Verified |
| 24 | Claude Sonnet 4 | 72.7% | — | 22 May 2025 | Self-reported | Primary |
| 25 | Qwen 3.5 27B | 72.4% | — | 24 Feb 2026 | Third-party | Primary Verified |
| 26 | Devstral 2 | 72.2% | 0-shot | 09 Dec 2025 | Self-reported | Primary |
| 27 | Qwen 3.5 122B A10B | 72.0% | — | 24 Feb 2026 | Third-party | Primary Verified |
| 28 | Grok Code Fast 1 | 70.8% | CoT | 09 Jul 2025 | Self-reported | Primary |
| 29 | Grok Build 0.1 | 70.8% | 0-shot · CoT · agentic | 14 May 2026 | Self-reported | |
| 30 | Qwen 3.5 35B A3B | 69.2% | — | 24 Feb 2026 | Third-party | Primary Verified |
| 31 | o3 | 69.1% | — | 16 Apr 2025 | Self-reported | Primary |
| 32 | o4 mini | 68.1% | — | 16 Apr 2025 | Self-reported | Primary |
| 33 | GLM 4.6 | 68.0% | CoT | 30 Sep 2025 | Self-reported | Primary |
| 34 | DeepSeek V3.2 Exp | 67.8% | CoT | 29 Sep 2025 | Self-reported | Primary |
| 35 | Qwen3 Coder | 67.0% | — | 22 Jul 2025 | Self-reported | Primary |
| 36 | Kimi K2 Instruct | 65.8% | — | 20 Jul 2025 | Paper | Primary |
| 37 | Gemini 2.5 Pro | 63.8% | CoT | 25 Mar 2025 | Self-reported | Primary |
| 38 | Claude Sonnet 3.7 (Thinking) | 62.3% | — | 24 Feb 2025 | Self-reported | Primary |
| 39 | Claude Sonnet 3.7 | 62.3% | — | 24 Feb 2025 | Self-reported | Primary |
| 40 | Gemini 2.5 Flash (Thinking) | 60.4% | — | 17 Dec 2025 | Self-reported | Primary |
| 41 | Gemini 2.5 Pro (Thinking) | 59.6% | — | 17 Dec 2025 | Self-reported | Primary |
| 42 | GPT 5.4 | 57.7% | — | 05 Mar 2026 | Self-reported | Primary |
| 43 | GPT 5.3 Codex | 56.8% | — | 05 Mar 2026 | Self-reported | Primary |
| 44 | GPT 4.1 | 55.0% | — | 14 Apr 2025 | Self-reported | Primary |
| 45 | GPT 5 | 52.8% | — | 07 Aug 2025 | Self-reported | Primary |
| 46 | DeepSeek-R1 | 49.2% | CoT | 21 Jan 2025 | Paper | Primary |
| 47 | o1 | 48.9% | — | 16 Apr 2025 | Self-reported | Primary |
| 48 | Nova Premier | 42.4% | — | 30 Apr 2025 | Self-reported | Primary |
| 49 | DeepSeek V3 | 42.0% | — | 26 Dec 2024 | Paper | Primary |
| 50 | Claude Haiku 3.5 | 40.6% | — | 22 Oct 2024 | Self-reported | Primary |
| 51 | Claude Sonnet 3.5 | 33.4% | 0-shot · agentic | 22 Oct 2024 | Self-reported | |
| 52 | Gemini 2.5 Flash-Lite | 31.6% | — | 26 Sep 2025 | Self-reported | Primary |
| 53 | Claude Haiku 3 | 7.20% | 0-shot · agentic | 22 Oct 2024 | Self-reported |
