ICBCBench-Leaderboard / data /leaderboard.csv
Leonnel1220's picture
Upload folder using huggingface_hub
5148820 verified
Raw
History Blame Contribute Delete
1.65 kB
model,overall,objective_en,objective_zh,objective_avg,subjective_en,subjective_zh,subjective_avg,expert_avg,citation_score,source_quality,rmsce
OpenClaw(+GPT-5.5),59.09,50.00,67.50,58.75,59.60,59.25,59.42,74.28,-,-,46.23
DeerFlow(+GPT-5.5),58.76,52.50,60.00,56.25,64.85,57.67,61.26,76.57,-,-,47.83
Gemini-deep-research,58.23,50.00,52.50,51.25,64.77,65.69,65.23,71.79,56.94,12.86,50.77
OpenClaw(+DeepSeek-V4-Pro),54.54,37.50,57.50,47.50,65.79,57.36,61.58,76.97,-,-,56.57
DeerFlow(+DeepSeek-V4-Pro),51.57,27.50,55.00,41.25,65.71,58.08,61.89,77.36,-,-,64.52
OpenAI-o3-deep-research,51.24,37.50,32.50,35.00,71.84,63.12,67.48,72.60,74.47,15.49,61.11
MiroThinker,48.63,52.50,45.00,48.75,53.15,43.88,48.52,60.64,-,-,55.57
Kimi-deep-research,46.16,35.00,35.00,35.00,60.19,54.44,57.31,71.64,-,-,62.94
GPT-5.5,43.75,27.50,27.50,27.50,62.69,57.33,60.01,75.02,-,-,52.32
Jina-deepsearch,42.73,37.50,35.00,36.25,47.51,50.89,49.20,52.52,46.36,27.54,65.67
Claude-opus-4-7,42.38,25.00,20.00,22.50,63.71,60.83,62.27,77.83,-,-,39.24
Doubao-deep-research,40.76,37.50,20.00,28.75,52.93,52.61,52.77,65.97,-,-,71.30
Perplexity-deep-research,39.26,22.50,22.50,22.50,63.17,48.85,56.01,70.02,-,-,55.65
Kimi-k2.5,38.48,17.50,10.00,13.75,64.81,61.60,63.20,79.01,-,-,77.62
Gemini-3.1-pro-preview,38.20,22.50,12.50,17.50,59.53,58.25,58.89,73.62,-,-,75.60
DeepSeek-V4-Pro,31.17,5.00,15.00,10.00,49.09,55.59,52.34,65.42,-,-,76.12
Grok-3-deepsearch,30.46,10.00,5.00,7.50,56.43,50.40,53.41,66.77,-,-,84.70
Qwen-deep-research,29.97,2.50,17.50,10.00,51.59,48.25,49.92,62.40,-,-,83.37
Tongyi-deepresearch-30b-a3b,24.09,2.50,5.00,3.75,46.69,42.16,44.42,55.53,-,-,87.92