TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

GeneBench-Pro

GeneBench-Pro: Evaluating Multistage Statistical Reasoning in Genomics, Quantitative Biology, and Translational Biomedicine

A 129-problem benchmark testing whether AI agents can perform realistic, multi-stage scientific analyses in genomics, quantitative biology, and translational biomedicine. Each problem gives a messy, synthetically-generated dataset with known ground truth and requires the agent to navigate ambiguous judgment calls, choose the correct analysis path, and arrive at a graded numerical answer. Best model (GPT-5.6 Sol) scores 28.7% (31.5% in Pro mode) as of release.

Agentic Text Accuracy Max 100.0% Released Jun 2026
11
Results
11
Models scored
28.7%
Top: GPT 5.6 Sol (max)
8.90%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
All data points share one date — no trend to plot.

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 GPT-5.6 Sol (Pro) 31.5% agentic 30 Jun 2026 Self-reported
2 GPT 5.6 Sol (max) 28.7% agentic 30 Jun 2026 Self-reported Primary
3 GPT-5.6 Terra (Pro) 28.5% agentic 30 Jun 2026 Self-reported
4 GPT-5.6 Luna (Pro) 23.6% agentic 30 Jun 2026 Self-reported
5 GPT-5.6 Terra (max) 23.3% agentic 30 Jun 2026 Self-reported Primary
6 GPT-5.5 (Pro) 20.5% agentic 30 Jun 2026 Self-reported
7 GPT-5.6 Luna (max) 16.5% agentic 30 Jun 2026 Self-reported Primary
8 GPT-5.4 (Pro) 16.3% agentic 30 Jun 2026 Self-reported
9 Claude Opus 4.8 16.0% 30 Jun 2026 Self-reported Primary
10 GPT-5.5 (xhigh) 12.0% agentic 30 Jun 2026 Self-reported Primary
11 GPT-5.4 (xhigh) 8.90% agentic 30 Jun 2026 Self-reported Primary
12 GPT-5.2 (Pro) 8.50% agentic 30 Jun 2026 Self-reported
13 Gemini 3.5 Flash 8.10% 30 Jun 2026 Self-reported Primary
14 GPT-5.2 (xhigh) 4.90% agentic 30 Jun 2026 Self-reported Primary
15 GLM 5.2 4.60% 30 Jun 2026 Self-reported Primary
16 Gemini 3.1 Pro 3.10% 30 Jun 2026 Self-reported Primary
17 Deepseek V4 Pro 2.40% 30 Jun 2026 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task