TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Aider Polyglot

Aider Polyglot Coding Benchmark

225 hard Exercism programming exercises across 6 languages (C++, Go, Java, JavaScript, Python, Rust). Measures whole-file edit accuracy under realistic agentic-coding harness.

Coding Text Accuracy Max 100.0% Released Dec 2024
12
Results
12
Models scored
89.4%
Top: Claude Opus 4.5
71.5%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Apr 2024Feb 2025Nov 2025

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 Claude Opus 4.5 89.4% 24 Nov 2025 Self-reported Primary
2 GPT 5.1 88.0% 0-shot · CoT 13 Nov 2025 Self-reported Primary
3 GPT 5 (Thinking) 88.0% 07 Aug 2025 Self-reported Primary
4 o3 (High) 81.3% 16 Apr 2024 Self-reported Primary
5 Claude Sonnet 4.5 78.8% 24 Nov 2025 Self-reported Primary
6 Gemini 2.5 Pro 74.0% 17 Jun 2025 Third-party Primary Verified
7 o4 mini (high) 68.9% 16 Apr 2025 Self-reported Primary
8 o1 (High) 64.4% 16 Apr 2025 Self-reported Primary
9 Qwen3 235B A22B 61.8% Pass@2 28 Apr 2025 Self-reported Primary
10 GPT 4.1 52.0% 14 Apr 2025 Self-reported Primary
11 Gemini 2.5 Flash-Lite 26.7% 26 Sep 2025 Self-reported Primary
12 GPT 5 26.7% 07 Aug 2025 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task