RULER 128k
Synthetic long-context evaluation suite measuring needle-in-a-haystack, multi-key retrieval and tracing across 128k token contexts.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Nemotron 3 Super | 88.3% | 0-shot | 03 Apr 2026 | Self-reported | Primary |
MongoDB - Build AI That Scales
