TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

BFCL v3

Berkeley Function-Calling Leaderboard v3

Evaluates function/tool-calling correctness across single, parallel, multi-turn and irrelevance-detection scenarios.

Agentic Text accuracy Max 100.0% Released Sep 2024
2
Results
2
Models scored
70.8%
Top: Qwen3 235B A22B
70.0%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
All data points share one date — no trend to plot.

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Qwen3 235B A22B 70.8% Apr 28, 2025 self reported primary
2 Qwen3 30B A3B 69.1% Apr 28, 2025 self reported primary
0 AIs selected
Clear selection
#
Name
Task