Safetensors
English
qwen3
Suu commited on
Commit
4b85a24
·
verified ·
1 Parent(s): 626667e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -11,6 +11,9 @@ metrics:
11
  - accuracy
12
  ---
13
 
 
 
 
14
  # ✨ Klear-Reasoner-8B
15
  We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
16
 
 
11
  - accuracy
12
  ---
13
 
14
+ ### Evaluation
15
+ **Evaluation is coming soon, stay tuned.**
16
+
17
  # ✨ Klear-Reasoner-8B
18
  We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
19