Kwai-Klear
/

Klear-Reasoner-8B

Safetensors

English

qwen3

Model card Files Files and versions

xet

Community

Suu commited on Sep 27, 2025

Commit

46de3f8

verified ·

1 Parent(s): d30a73e

Update README.md

Browse files

Files changed (1) hide show

README.md +46 -6

README.md CHANGED Viewed

@@ -11,6 +11,17 @@ metrics:
 - accuracy
 ---
 # ✨ Klear-Reasoner-8B
 We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
@@ -133,13 +144,42 @@ python judge_math.py <path_to_inference_results>
 ## 🤝 Citation
 If you find this work helpful, please cite our paper:
 ```bibtex
-@misc{su2025klearreasoneradvancingreasoningcapability,
-      title={Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization},
-      author={Zhenpeng Su and Leiyu Pan and Xue Bai and Dening Liu and Guanting Dong and Jiaming Huang and Wenping Hu and Fuzheng Zhang and Kun Gai and Guorui Zhou},
       year={2025},
-      eprint={2508.07629},
       archivePrefix={arXiv},
       primaryClass={cs.LG},
-      url={https://arxiv.org/abs/2508.07629},
 }
-```

 - accuracy
 ---
+## 📣 Latest News
+**[September 26, 2025]** 🔍 We further explored GPPO in depth and proposed **CE-GPPO**, focusing on the impact of ppo-clip tokens on entropy. 📄 The paper is available on [arXiv](https://arxiv.org/pdf/2509.20712) and [HuggingFace Daily](https://huggingface.co/papers/2509.20712).
+**[August 12, 2025]** 🚀 We released the checkpoint for [KlearReasoner-8B](https://huggingface.co/Kwai-Klear/Klear-Reasoner-8B), along with the training data.
+**[August 11, 2025]** 🔬 KlearReasoner-8B conducted preliminary exploration of GPPO.
+**[August 11, 2025]** 🏆 We released KlearReasoner-8B, achieving SOTA performance among small-scale 7/8B models.
+**[August 11, 2025]** 📢 KlearReasoner is available on [arXiv](https://arxiv.org/pdf/2508.07629) and [HuggingFace Daily](https://huggingface.co/papers/2508.07629).
 # ✨ Klear-Reasoner-8B
 We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
 ## 🤝 Citation
 If you find this work helpful, please cite our paper:
 ```bibtex
+@misc{su2025cegppocontrollingentropygradientpreserving,
+      title={CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning},
+      author={Zhenpeng Su and Leiyu Pan and Minxuan Lv and Yuntao Li and Wenping Hu and Fuzheng Zhang and Kun Gai and Guorui Zhou},
       year={2025},
+      eprint={2509.20712},
       archivePrefix={arXiv},
       primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2509.20712},
+}
+```
+```bibtex
+@article{DBLP:journals/corr/abs-2508-07629,
+  author       = {Zhenpeng Su and
+                  Leiyu Pan and
+                  Xue Bai and
+                  Dening Liu and
+                  Guanting Dong and
+                  Jiaming Huang and
+                  Wenping Hu and
+                  Fuzheng Zhang and
+                  Kun Gai and
+                  Guorui Zhou},
+  title        = {Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving
+                  Clipping Policy Optimization},
+  journal      = {CoRR},
+  volume       = {abs/2508.07629},
+  year         = {2025},
+  url          = {https://doi.org/10.48550/arXiv.2508.07629},
+  doi          = {10.48550/ARXIV.2508.07629},
+  eprinttype    = {arXiv},
+  eprint       = {2508.07629},
+  timestamp    = {Sat, 13 Sep 2025 14:46:27 +0200},
+  biburl       = {https://dblp.org/rec/journals/corr/abs-2508-07629.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
 }
+```