Spaces:
Sleeping
Sleeping
Create leaderboard.csv
Browse files- leaderboard.csv +10 -0
leaderboard.csv
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Model,Score,Completeness,Grounding,Success Rate,Recovery Rate,Flexibility,Order,Info Diversity,Format,Tradeoff,Tool Calls,# Turns,Progress Tracking,Goal Decomposition
|
| 2 |
+
gemini-3-pro-preview,5.87,4.75,2.58,88.8%,89.0%,68.8%,53.8%,97.8%,53.9%,13.3%,47.86,3.20,7.60,8.66
|
| 3 |
+
claude-opus-4.5,5.42,4.70,2.93,92.7%,83.7%,60.8%,65.4%,73.3%,51.0%,33.7%,45.16,4.01,6.41,7.72
|
| 4 |
+
deepseek-v3.2,4.97,4.00,2.18,87.5%,90.6%,72.4%,73.1%,70.0%,39.5%,17.9%,21.73,4.92,6.46,8.04
|
| 5 |
+
glm-4.6v,4.86,4.01,1.18,84.8%,71.5%,57.3%,75.6%,52.2%,34.2%,11.5%,18.03,3.27,7.20,8.50
|
| 6 |
+
grok-4,4.78,3.80,1.95,87.8%,89.0%,63.6%,64.1%,92.2%,68.3%,35.5%,27.37,2.55,6.02,8.28
|
| 7 |
+
gpt-oss-120b,4.66,3.42,1.28,86.3%,72.7%,59.7%,87.2%,38.9%,35.8%,13.3%,14.40,3.14,6.53,8.10
|
| 8 |
+
gpt-5.2,4.43,3.42,3.80,85.5%,79.3%,55.4%,71.6%,37.2%,12.4%,12.5%,29.20,2.30,5.62,7.73
|
| 9 |
+
qwen3-235b-a22b,3.53,2.56,1.17,87.9%,88.1%,66.1%,80.8%,43.3%,31.3%,8.6%,11.15,4.41,6.93,8.51
|
| 10 |
+
gpt-4o-mini,3.07,1.13,0.85,87.5%,50.6%,39.7%,85.9%,46.7%,3.3%,0.0%,51.71,6.45,6.00,7.71
|