Update README.md
Browse files
README.md
CHANGED
|
@@ -51,10 +51,18 @@ We conduct evaluation on both mathematical and coding benchmarks. Due to the hig
|
|
| 51 |
|
| 52 |
<img src="assets/math_benchmark_table.png" width="100%"/>
|
| 53 |
|
| 54 |
-
<img src="assets/code_benchmark_table.png" width="100%"/>
|
| 55 |
-
|
| 56 |
</div>
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
## Technical Report
|
| 59 |
|
| 60 |
[Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR](https://arxiv.org/abs/2507.15778)
|
|
|
|
| 51 |
|
| 52 |
<img src="assets/math_benchmark_table.png" width="100%"/>
|
| 53 |
|
|
|
|
|
|
|
| 54 |
</div>
|
| 55 |
|
| 56 |
+
| Method | LCB v5 (2024.08.01–2025.02.01) | | LCB v6 (2025.02.01–2025.05.01) | | Avg. |
|
| 57 |
+
|---------------------|-------------------------------|------------------------|-------------------------------|------------------------|------|
|
| 58 |
+
| | avg@8 | pass@8 | avg@16 | pass@16 | |
|
| 59 |
+
| DeepSeek-R1-1.5B | 16.7 | 29.0 | 17.2 | 34.4 | 17.0 |
|
| 60 |
+
| DAPO | 26.0 | 40.5 | 27.6 | 43.5 | 26.8 |
|
| 61 |
+
| DeepCoder-1.5B | 23.3 | 39.1 | 22.6 | 42.0 | 23.0 |
|
| 62 |
+
| Nemotron-1.5B | 26.1 | 35.5 | 29.5 | 42.8 | 27.8 |
|
| 63 |
+
| **Archer-Code-1.5B**| **29.4** | **43.7** | **30.2** | **45.8** | **29.8** |
|
| 64 |
+
|
| 65 |
+
|
| 66 |
## Technical Report
|
| 67 |
|
| 68 |
[Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR](https://arxiv.org/abs/2507.15778)
|