Update README.md
Browse files
README.md
CHANGED
|
@@ -144,15 +144,6 @@ Configuration: temperature **0.2** (greedy-ish decoding).
|
|
| 144 |
| Level 4 | 90.62% |
|
| 145 |
| Level 5 | 86.57% |
|
| 146 |
|
| 147 |
-
---
|
| 148 |
-
|
| 149 |
-
### 5) Coming next
|
| 150 |
-
|
| 151 |
-
Remaining benchmarks still on the queue:
|
| 152 |
-
|
| 153 |
-
* **WildBench** (open-world / wild-task robustness)
|
| 154 |
-
* **SWE-Bench** (software engineering & repo-level tasks)
|
| 155 |
-
|
| 156 |
---
|
| 157 |
```
|
| 158 |
|
|
|
|
| 144 |
| Level 4 | 90.62% |
|
| 145 |
| Level 5 | 86.57% |
|
| 146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
---
|
| 148 |
```
|
| 149 |
|