SWE-bench Multimodal
Variant of SWE-bench where issues include screenshots, diagrams and other visual context. Tests multimodal software-engineering ability.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Deepseek 3.2 | 70.2% | — | Dec 1, 2025 | paper | primary verified |
