MathVista

MathVista (testmini)

Mathematical reasoning over visual contexts: figures, charts, diagrams, geometric drawings.

Multimodal Multimodal Accuracy Max 100.0% Released Oct 2023

Results

Models scored

87.4%

Top: Qwen 3.6 27B

72.0%

Median

Best results

Top primary scores; one row per model.

87.4%

86.8%

84.3%

73.7%

72.0%

71.8%

70.7%

63.8%

58.3%

Each dot is one model result; the line traces the running best score.

Showing all configurations including non-primary alternates. · Show only primary

#	Model	Score	Conditions	Eval date	Source	Flags
1	Qwen 3.6 27B	87.4%	0-shot · standard	—	Self-reported	Primary
2	o3	86.8%	—	16 Apr 2025	Self-reported	Primary
3	o4 mini	84.3%	—	16 Apr 2025	Self-reported	Primary
4	Llama 4 Maverick	73.7%	—	05 Apr 2025	Self-reported	Primary
5	GPT 4.1	72.0%	—	14 Apr 2025	Self-reported	Primary
6	o1	71.8%	—	16 Apr 2025	Self-reported	Primary
7	Llama 4 Scout	70.7%	—	05 Apr 2025	Self-reported	Primary
8	Claude Sonnet 3.5	67.7%	0-shot · standard	20 Jun 2024	Self-reported
9	Gemini 1.5 Pro	63.9%	0-shot · standard	01 May 2024	Self-reported
10	GPT-4o	63.8%	—	16 Apr 2025	Self-reported	Primary
11	Pixtral 12B	58.3%	CoT	10 Oct 2024	Self-reported	Primary
12	Gemini Ultra	53.0%	0-shot · standard	06 Dec 2023	Self-reported
13	Claude Haiku 3	46.4%	0-shot · standard	04 Mar 2024	Self-reported