xiziqiao commited on
Commit
ccf75ad
·
verified ·
1 Parent(s): 3e884b6

Create leaderboard.csv

Browse files
Files changed (1) hide show
  1. leaderboard.csv +10 -0
leaderboard.csv ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ Model,Score,Completeness,Grounding,Success Rate,Recovery Rate,Flexibility,Order,Info Diversity,Format,Tradeoff,Tool Calls,# Turns,Progress Tracking,Goal Decomposition
2
+ gemini-3-pro-preview,5.87,4.75,2.58,88.8%,89.0%,68.8%,53.8%,97.8%,53.9%,13.3%,47.86,3.20,7.60,8.66
3
+ claude-opus-4.5,5.42,4.70,2.93,92.7%,83.7%,60.8%,65.4%,73.3%,51.0%,33.7%,45.16,4.01,6.41,7.72
4
+ deepseek-v3.2,4.97,4.00,2.18,87.5%,90.6%,72.4%,73.1%,70.0%,39.5%,17.9%,21.73,4.92,6.46,8.04
5
+ glm-4.6v,4.86,4.01,1.18,84.8%,71.5%,57.3%,75.6%,52.2%,34.2%,11.5%,18.03,3.27,7.20,8.50
6
+ grok-4,4.78,3.80,1.95,87.8%,89.0%,63.6%,64.1%,92.2%,68.3%,35.5%,27.37,2.55,6.02,8.28
7
+ gpt-oss-120b,4.66,3.42,1.28,86.3%,72.7%,59.7%,87.2%,38.9%,35.8%,13.3%,14.40,3.14,6.53,8.10
8
+ gpt-5.2,4.43,3.42,3.80,85.5%,79.3%,55.4%,71.6%,37.2%,12.4%,12.5%,29.20,2.30,5.62,7.73
9
+ qwen3-235b-a22b,3.53,2.56,1.17,87.9%,88.1%,66.1%,80.8%,43.3%,31.3%,8.6%,11.15,4.41,6.93,8.51
10
+ gpt-4o-mini,3.07,1.13,0.85,87.5%,50.6%,39.7%,85.9%,46.7%,3.3%,0.0%,51.71,6.45,6.00,7.71