Add ResearchClawBench evaluation result

#22
by black-yt - opened

Hi zai-org team,

We would like to add the ResearchClawBench overall evaluation result for GLM-5.2.

ResearchClawBench: https://huggingface.co/datasets/InternScience/ResearchClawBench
ResearchHarness: https://github.com/InternScience/ResearchHarness
Leaderboard: https://internscience.github.io/ResearchClawBench-Home/

ResearchClawBench is an end-to-end scientific research benchmark for evaluating AI agents and LLMs on tasks that require reading data and related work, writing code, producing figures, and generating publication-style reports.

This result was produced with ResearchHarness, using tools enabled, code execution, and a file-system workspace. The reported value is the mean score out of 100 over completed ResearchClawBench tasks.

Result details:

  • Model: GLM-5.2
  • Score: 20.709230769230768
  • Completed tasks: 39/40
  • Run date: 2026-06-22 to 2026-06-23
  • Dataset task id: overall

Thank you!

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment