TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MathVista

MathVista (testmini)

Mathematical reasoning over visual contexts: figures, charts, diagrams, geometric drawings.

Multimodal Multimodal Accuracy Max 100.0% Released Oct 2023
9
Results
9
Models scored
87.4%
Top: Qwen 3.6 27B
72.0%
Median

Best results

Top primary scores; one row per model.
2
86.8%
3
84.3%
5
72.0%
6
71.8%
8
63.8%

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Oct 2024Jan 2025Apr 2025

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 Qwen 3.6 27B 87.4% 0-shot · standard Self-reported Primary
2 o3 86.8% 16 Apr 2025 Self-reported Primary
3 o4 mini 84.3% 16 Apr 2025 Self-reported Primary
4 Llama 4 Maverick 73.7% 05 Apr 2025 Self-reported Primary
5 GPT 4.1 72.0% 14 Apr 2025 Self-reported Primary
6 o1 71.8% 16 Apr 2025 Self-reported Primary
7 Llama 4 Scout 70.7% 05 Apr 2025 Self-reported Primary
8 Claude Sonnet 3.5 67.7% 0-shot · standard 20 Jun 2024 Self-reported
9 Gemini 1.5 Pro 63.9% 0-shot · standard 01 May 2024 Self-reported
10 GPT-4o 63.8% 16 Apr 2025 Self-reported Primary
11 Pixtral 12B 58.3% CoT 10 Oct 2024 Self-reported Primary
12 Gemini Ultra 53.0% 0-shot · standard 06 Dec 2023 Self-reported
13 Claude Haiku 3 46.4% 0-shot · standard 04 Mar 2024 Self-reported
0 AIs selected
Clear selection
#
Name
Task