zRzRzRzRzRzRzR
commited on
Commit
·
fbfe45b
1
Parent(s):
c35b39f
init3
Browse files
README.md
CHANGED
|
@@ -44,7 +44,7 @@ Reinforcement learning aims to bridge the gap between competence and excellence
|
|
| 44 |
| SWE-bench Multilingual | 73.3 | 66.7 | 70.2 | 73.0 | 77.5 | 65.0 | 72.0 |
|
| 45 |
| Terminal-Bench 2.0 (Terminus 2) | 56.2 / 60.7 (verified) | 41.0 | 39.3 | 50.8 | 59.3 | 54.2 | 54.0 |
|
| 46 |
| Terminal-Bench 2.0 (Claude Code) | 56.2 / 61.1 (verified) | 32.8 | 46.4 | - | 57.9 | - | - |
|
| 47 |
-
| CyberGym |
|
| 48 |
| Agentic | | | | | | | |
|
| 49 |
| BrowseComp | 62.0 | 52.0 | 51.4 | 52 / 60.6 | 37.0 | 37.8 | - |
|
| 50 |
| BrowseComp (w/ Context Manage) | 75.9 | 67.5 | 67.6 | 74.9 | 57.8 | 59.2 | 65.8 |
|
|
|
|
| 44 |
| SWE-bench Multilingual | 73.3 | 66.7 | 70.2 | 73.0 | 77.5 | 65.0 | 72.0 |
|
| 45 |
| Terminal-Bench 2.0 (Terminus 2) | 56.2 / 60.7 (verified) | 41.0 | 39.3 | 50.8 | 59.3 | 54.2 | 54.0 |
|
| 46 |
| Terminal-Bench 2.0 (Claude Code) | 56.2 / 61.1 (verified) | 32.8 | 46.4 | - | 57.9 | - | - |
|
| 47 |
+
| CyberGym | 43.2 | 23.5 | 17.3 | 41.3 | 50.6 | 39.9 | - |
|
| 48 |
| Agentic | | | | | | | |
|
| 49 |
| BrowseComp | 62.0 | 52.0 | 51.4 | 52 / 60.6 | 37.0 | 37.8 | - |
|
| 50 |
| BrowseComp (w/ Context Manage) | 75.9 | 67.5 | 67.6 | 74.9 | 57.8 | 59.2 | 65.8 |
|