Update README.md
Browse files
README.md
CHANGED
|
@@ -6,14 +6,14 @@ base_model:
|
|
| 6 |
- Qwen/Qwen3-8B-Base
|
| 7 |
---
|
| 8 |
|
| 9 |
-
# ✨ Klear-Reasoner
|
| 10 |
We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
|
| 11 |
|
| 12 |
|
| 13 |
## 📌 Overview
|
| 14 |
|
| 15 |
<div align="center">
|
| 16 |
-
<img src="
|
| 17 |
|
| 18 |
<sub>Benchmark accuracy of Klear-Reasoner-8B on AIME 2024/2025 (avg@64), LiveCodeBench V5 (2024/08/01-2025/02/01, avg@8), and v6 (2025/02/01-2025/05/01, avg@8).</sub>
|
| 19 |
</div>
|
|
|
|
| 6 |
- Qwen/Qwen3-8B-Base
|
| 7 |
---
|
| 8 |
|
| 9 |
+
# ✨ Klear-Reasoner-8B
|
| 10 |
We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
|
| 11 |
|
| 12 |
|
| 13 |
## 📌 Overview
|
| 14 |
|
| 15 |
<div align="center">
|
| 16 |
+
<img src="assets/main_result.png" width="100%"/>
|
| 17 |
|
| 18 |
<sub>Benchmark accuracy of Klear-Reasoner-8B on AIME 2024/2025 (avg@64), LiveCodeBench V5 (2024/08/01-2025/02/01, avg@8), and v6 (2025/02/01-2025/05/01, avg@8).</sub>
|
| 19 |
</div>
|