Safetensors
English
qwen3
Suu commited on
Commit
7df4401
·
verified ·
1 Parent(s): 2de7cbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -6,14 +6,14 @@ base_model:
6
  - Qwen/Qwen3-8B-Base
7
  ---
8
 
9
- # ✨ Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
10
  We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
11
 
12
 
13
  ## 📌 Overview
14
 
15
  <div align="center">
16
- <img src="https://github.com/suu990901/KlearReasoner/blob/main/docker/main_result.png" width="100%"/>
17
 
18
  <sub>Benchmark accuracy of Klear-Reasoner-8B on AIME 2024/2025 (avg@64), LiveCodeBench V5 (2024/08/01-2025/02/01, avg@8), and v6 (2025/02/01-2025/05/01, avg@8).</sub>
19
  </div>
 
6
  - Qwen/Qwen3-8B-Base
7
  ---
8
 
9
+ # ✨ Klear-Reasoner-8B
10
  We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
11
 
12
 
13
  ## 📌 Overview
14
 
15
  <div align="center">
16
+ <img src="assets/main_result.png" width="100%"/>
17
 
18
  <sub>Benchmark accuracy of Klear-Reasoner-8B on AIME 2024/2025 (avg@64), LiveCodeBench V5 (2024/08/01-2025/02/01, avg@8), and v6 (2025/02/01-2025/05/01, avg@8).</sub>
19
  </div>