TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

HumanEval+

HumanEval+ (EvalPlus)

HumanEval with substantially expanded test cases (~80x more) to catch wrong-but-passing solutions.

Coding Text Pass@k Max 100.0% Released May 2023 Saturated Possibly contaminated
2
Results
2
Models scored
92.3%
Top: Phi 4 reasoning plus
78.5%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0May 2025Jun 2025Jul 2025

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Phi 4 reasoning plus 92.3% 08 Jul 2025 Self-reported Primary
2 WizardCoder 64.6% 27 May 2025 Paper Primary
0 AIs selected
Clear selection
#
Name
Task