TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MGSM

Multilingual Grade School Math

GSM8K translated into 10 typologically diverse languages. Tests cross-lingual mathematical reasoning.

Language Text accuracy Max 100.0% Released Oct 2022
5
Results
5
Models scored
91.1%
Top: Llama 3.3
90.5%
Median

Best results

Top primary scores; one row per model.
1
91.1%
3
90.5%
5
58.2%

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Oct 2024Jul 2025Apr 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Llama 3.3 91.1% 0-shot Dec 6, 2024 self reported primary
2 Claude Opus 3 90.7% 0-shot Oct 22, 2024 self reported primary
3 GPT-4o 90.5% Apr 16, 2025 self reported primary
4 Nemotron 3 Super 87.5% 8-shot Apr 3, 2026 self reported primary
5 Llama 3.2 58.2% 0-shot · CoT Oct 25, 2024 self reported primary
0 AIs selected
Clear selection
#
Name
Task