Update README.md
Browse files
README.md
CHANGED
|
@@ -93,13 +93,13 @@ The model shows strong agentic behavior: it recovers from errors (read-before-wr
|
|
| 93 |
| **AIME 2025** (pass@5) | 90 | | | | 91.7 | 91.6 | | |
|
| 94 |
| **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | | | 73 |
|
| 95 |
| **GPQA Diamond** (pass@3) | **86.4** | | | | | | | |
|
| 96 |
-
| **Terminal-Bench 2.0** | **
|
| 97 |
|
| 98 |
</div>
|
| 99 |
|
| 100 |
-
- **GPQA Diamond pass@1: 83.8** (166/198). +2.1 points over the Qwen3.5-9B base model (81.7). At pass@3: **86.4** (171/198).
|
| 101 |
-
- **AIME 2025 pass@5: 90** (27/30).
|
| 102 |
-
- **Terminal-Bench 2.0:
|
| 103 |
|
| 104 |
---
|
| 105 |
|
|
|
|
| 93 |
| **AIME 2025** (pass@5) | 90 | | | | 91.7 | 91.6 | | |
|
| 94 |
| **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | | | 73 |
|
| 95 |
| **GPQA Diamond** (pass@3) | **86.4** | | | | | | | |
|
| 96 |
+
| **Terminal-Bench 2.0** | **23.6** | 14.6 | | | | | 33.4 | 27 |
|
| 97 |
|
| 98 |
</div>
|
| 99 |
|
| 100 |
+
- **GPQA Diamond pass@1: 83.8%** (166/198). +2.1 points over the Qwen3.5-9B base model (81.7). At pass@3: **86.4** (171/198).
|
| 101 |
+
- **AIME 2025 pass@5: 90%** (27/30).
|
| 102 |
+
- **Terminal-Bench 2.0: 23.6%** (21/89). +8.99 points over the Qwen3.5-9B base model (14.6%, 13/89).
|
| 103 |
|
| 104 |
---
|
| 105 |
|