qingy2024 commited on
Commit
0ca20e6
·
verified ·
1 Parent(s): 3c02ff8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -93,13 +93,13 @@ The model shows strong agentic behavior: it recovers from errors (read-before-wr
93
  | **AIME 2025** (pass@5) | 90 | | | | 91.7 | 91.6 | | |
94
  | **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | | | 73 |
95
  | **GPQA Diamond** (pass@3) | **86.4** | | | | | | | |
96
- | **Terminal-Bench 2.0** | **28.1** | 20 | | | | | 33.4 | 27 |
97
 
98
  </div>
99
 
100
- - **GPQA Diamond pass@1: 83.8** (166/198). +2.1 points over the Qwen3.5-9B base model (81.7). At pass@3: **86.4** (171/198).
101
- - **AIME 2025 pass@5: 90** (27/30).
102
- - **Terminal-Bench 2.0: 28.1** (25/89). +8.1 points over the Qwen3.5-9B base model (20).
103
 
104
  ---
105
 
 
93
  | **AIME 2025** (pass@5) | 90 | | | | 91.7 | 91.6 | | |
94
  | **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | | | 73 |
95
  | **GPQA Diamond** (pass@3) | **86.4** | | | | | | | |
96
+ | **Terminal-Bench 2.0** | **23.6** | 14.6 | | | | | 33.4 | 27 |
97
 
98
  </div>
99
 
100
+ - **GPQA Diamond pass@1: 83.8%** (166/198). +2.1 points over the Qwen3.5-9B base model (81.7). At pass@3: **86.4** (171/198).
101
+ - **AIME 2025 pass@5: 90%** (27/30).
102
+ - **Terminal-Bench 2.0: 23.6%** (21/89). +8.99 points over the Qwen3.5-9B base model (14.6%, 13/89).
103
 
104
  ---
105