CoCoOne commited on
Commit
e71bc01
·
verified ·
1 Parent(s): 26e1bd6

Add ResearchClawBench evaluation result

Browse files

Adds ResearchClawBench overall evaluation metadata.

ResearchClawBench is a tool-using, code-execution, file-system-workspace benchmark. The value is the mean score out of 100 over completed ResearchHarness runs, with details available from the ResearchClawBench leaderboard.

.eval_results/researchclawbench.yaml ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: InternScience/ResearchClawBench
3
+ task_id: overall
4
+ value: 18.19
5
+ date: "2026-04-14"
6
+ notes: "ResearchHarness: https://huggingface.co/spaces/InternScience/ResearchHarness; ResearchClawBench: https://huggingface.co/datasets/InternScience/ResearchClawBench; tools enabled; code execution; file-system workspace; completed 40/40 tasks"
7
+ source:
8
+ url: https://huggingface.co/zai-org/GLM-5.1
9
+ name: Model Card