Update README.md
Browse files
README.md
CHANGED
|
@@ -11,18 +11,6 @@ metrics:
|
|
| 11 |
- accuracy
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# π£ Latest News
|
| 15 |
-
**[September 26, 2025]** π We further explored GPPO in depth and proposed **CE-GPPO**, focusing on the impact of ppo-clip tokens on entropy. π The paper is available on [arXiv](https://arxiv.org/pdf/2509.20712) and [HuggingFace Daily](https://huggingface.co/papers/2509.20712).
|
| 16 |
-
|
| 17 |
-
**[August 12, 2025]** π We released the checkpoint for [KlearReasoner-8B](https://huggingface.co/Kwai-Klear/Klear-Reasoner-8B), along with the training data.
|
| 18 |
-
|
| 19 |
-
**[August 11, 2025]** π¬ KlearReasoner-8B conducted preliminary exploration of GPPO.
|
| 20 |
-
|
| 21 |
-
**[August 11, 2025]** π We released KlearReasoner-8B, achieving SOTA performance among small-scale 7/8B models.
|
| 22 |
-
|
| 23 |
-
**[August 11, 2025]** π’ KlearReasoner is available on [arXiv](https://arxiv.org/pdf/2508.07629) and [HuggingFace Daily](https://huggingface.co/papers/2508.07629).
|
| 24 |
-
|
| 25 |
-
|
| 26 |
|
| 27 |
# β¨ Klear-Reasoner-8B
|
| 28 |
We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
|
|
|
|
| 11 |
- accuracy
|
| 12 |
---
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
# β¨ Klear-Reasoner-8B
|
| 16 |
We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
|