Aider Polyglot

Aider Polyglot Coding Benchmark

225 hard Exercism programming exercises across 6 languages (C++, Go, Java, JavaScript, Python, Rust). Measures whole-file edit accuracy under realistic agentic-coding harness.

Coding Text Accuracy Max 100.0% Released Dec 2024

Homepage Code

Results

Models scored

89.4%

Top: Claude Opus 4.5

71.5%

Median

Best results

Top primary scores; one row per model.

89.4%

88.0%

88.0%

81.3%

78.8%

74.0%

68.9%

64.4%

61.8%

52.0%

Frontier over time

Each dot is one model result; the line traces the running best score.

All results

Showing one canonical row per model. Show all configurations

#	Model	Score	Conditions	Eval date	Source	Flags
1	Claude Opus 4.5	89.4%	—	24 Nov 2025	Self-reported	Primary
2	GPT 5.1	88.0%	0-shot · CoT	13 Nov 2025	Self-reported	Primary
3	GPT 5 (Thinking)	88.0%	—	07 Aug 2025	Self-reported	Primary
4	o3 (High)	81.3%	—	16 Apr 2024	Self-reported	Primary
5	Claude Sonnet 4.5	78.8%	—	24 Nov 2025	Self-reported	Primary
6	Gemini 2.5 Pro	74.0%	—	17 Jun 2025	Third-party	Primary Verified
7	o4 mini (high)	68.9%	—	16 Apr 2025	Self-reported	Primary
8	o1 (High)	64.4%	—	16 Apr 2025	Self-reported	Primary
9	Qwen3 235B A22B	61.8%	Pass@2	28 Apr 2025	Self-reported	Primary
10	GPT 4.1	52.0%	—	14 Apr 2025	Self-reported	Primary
11	Gemini 2.5 Flash-Lite	26.7%	—	26 Sep 2025	Self-reported	Primary
12	GPT 5	26.7%	—	07 Aug 2025	Self-reported	Primary

Go to section

Search

Aider Polyglot

Best results

Frontier over time

All results

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: