TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Aider Polyglot

Aider Polyglot Coding Benchmark

225 hard Exercism programming exercises across 6 languages (C++, Go, Java, JavaScript, Python, Rust). Measures whole-file edit accuracy under realistic agentic-coding harness.

Coding Text accuracy Max 100.0% Released Dec 2024
12
Results
12
Models scored
89.4%
Top: Claude Opus 4.5
71.5%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Apr 2024Feb 2025Nov 2025

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Claude Opus 4.5 89.4% Nov 24, 2025 self reported primary
2 GPT 5.1 88.0% 0-shot · CoT Nov 13, 2025 self reported primary
3 GPT 5 (Thinking) 88.0% Aug 7, 2025 self reported primary
4 o3 (High) 81.3% Apr 16, 2024 self reported primary
5 Claude Sonnet 4.5 78.8% Nov 24, 2025 self reported primary
6 Gemini 2.5 Pro 74.0% Jun 17, 2025 third party primary verified
7 o4 mini (high) 68.9% Apr 16, 2025 self reported primary
8 o1 (High) 64.4% Apr 16, 2025 self reported primary
9 Qwen3 235B A22B 61.8% Pass@2 Apr 28, 2025 self reported primary
10 GPT 4.1 52.0% Apr 14, 2025 self reported primary
11 Gemini 2.5 Flash-Lite 26.7% Sep 26, 2025 self reported primary
12 GPT 5 26.7% Aug 7, 2025 self reported primary
0 AIs selected
Clear selection
#
Name
Task