zRzRzRzRzRzRzR commited on
Commit
fbfe45b
·
1 Parent(s): c35b39f
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -44,7 +44,7 @@ Reinforcement learning aims to bridge the gap between competence and excellence
44
  | SWE-bench Multilingual | 73.3 | 66.7 | 70.2 | 73.0 | 77.5 | 65.0 | 72.0 |
45
  | Terminal-Bench 2.0 (Terminus 2) | 56.2 / 60.7 (verified) | 41.0 | 39.3 | 50.8 | 59.3 | 54.2 | 54.0 |
46
  | Terminal-Bench 2.0 (Claude Code) | 56.2 / 61.1 (verified) | 32.8 | 46.4 | - | 57.9 | - | - |
47
- | CyberGym | 48.3 | 23.5 | 17.3 | 41.3 | 50.6 | 39.9 | - |
48
  | Agentic | | | | | | | |
49
  | BrowseComp | 62.0 | 52.0 | 51.4 | 52 / 60.6 | 37.0 | 37.8 | - |
50
  | BrowseComp (w/ Context Manage) | 75.9 | 67.5 | 67.6 | 74.9 | 57.8 | 59.2 | 65.8 |
 
44
  | SWE-bench Multilingual | 73.3 | 66.7 | 70.2 | 73.0 | 77.5 | 65.0 | 72.0 |
45
  | Terminal-Bench 2.0 (Terminus 2) | 56.2 / 60.7 (verified) | 41.0 | 39.3 | 50.8 | 59.3 | 54.2 | 54.0 |
46
  | Terminal-Bench 2.0 (Claude Code) | 56.2 / 61.1 (verified) | 32.8 | 46.4 | - | 57.9 | - | - |
47
+ | CyberGym | 43.2 | 23.5 | 17.3 | 41.3 | 50.6 | 39.9 | - |
48
  | Agentic | | | | | | | |
49
  | BrowseComp | 62.0 | 52.0 | 51.4 | 52 / 60.6 | 37.0 | 37.8 | - |
50
  | BrowseComp (w/ Context Manage) | 75.9 | 67.5 | 67.6 | 74.9 | 57.8 | 59.2 | 65.8 |