TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

GeneBench-Pro

GeneBench-Pro: Evaluating Multistage Statistical Reasoning in Genomics, Quantitative Biology, and Translational Biomedicine

A 129-problem benchmark testing whether AI agents can perform realistic, multi-stage scientific analyses in genomics, quantitative biology, and translational biomedicine. Each problem gives a messy, synthetically-generated dataset with known ground truth and requires the agent to navigate ambiguous judgment calls, choose the correct analysis path, and arrive at a graded numerical answer. Best model (GPT-5.6 Sol) scores 28.7% (31.5% in Pro mode) as of release.

Agentic Text Accuracy Max 100.0% Released Jun 2026
1
Results
1
Models scored
28.7%
Top: GPT 5.6 Sol (max)
28.7%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Not enough data to plot a trend yet.

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 GPT 5.6 Sol (max) 28.7% agentic 30 Jun 2026 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task