Update README.md
Browse files
README.md
CHANGED
|
@@ -120,6 +120,7 @@ foundation for next-generation language model agents to reason and tackle real-w
|
|
| 120 |
| ***General Assistant***| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 |
|
| 121 |
|
| 122 |
\* conducted on the text-only HLE subset.
|
|
|
|
| 123 |
Our models are evaluated with temperature=1.0, top_p=0.95.
|
| 124 |
|
| 125 |
### SWE-bench methodology
|
|
|
|
| 120 |
| ***General Assistant***| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 |
|
| 121 |
|
| 122 |
\* conducted on the text-only HLE subset.
|
| 123 |
+
|
| 124 |
Our models are evaluated with temperature=1.0, top_p=0.95.
|
| 125 |
|
| 126 |
### SWE-bench methodology
|