Safetensors
English
qwen3
Suu commited on
Commit
e010c81
Β·
verified Β·
1 Parent(s): 46de3f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -11,7 +11,7 @@ metrics:
11
  - accuracy
12
  ---
13
 
14
- ## πŸ“£ Latest News
15
  **[September 26, 2025]** πŸ” We further explored GPPO in depth and proposed **CE-GPPO**, focusing on the impact of ppo-clip tokens on entropy. πŸ“„ The paper is available on [arXiv](https://arxiv.org/pdf/2509.20712) and [HuggingFace Daily](https://huggingface.co/papers/2509.20712).
16
 
17
  **[August 12, 2025]** πŸš€ We released the checkpoint for [KlearReasoner-8B](https://huggingface.co/Kwai-Klear/Klear-Reasoner-8B), along with the training data.
@@ -23,6 +23,7 @@ metrics:
23
  **[August 11, 2025]** πŸ“’ KlearReasoner is available on [arXiv](https://arxiv.org/pdf/2508.07629) and [HuggingFace Daily](https://huggingface.co/papers/2508.07629).
24
 
25
 
 
26
  # ✨ Klear-Reasoner-8B
27
  We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
28
 
 
11
  - accuracy
12
  ---
13
 
14
+ # πŸ“£ Latest News
15
  **[September 26, 2025]** πŸ” We further explored GPPO in depth and proposed **CE-GPPO**, focusing on the impact of ppo-clip tokens on entropy. πŸ“„ The paper is available on [arXiv](https://arxiv.org/pdf/2509.20712) and [HuggingFace Daily](https://huggingface.co/papers/2509.20712).
16
 
17
  **[August 12, 2025]** πŸš€ We released the checkpoint for [KlearReasoner-8B](https://huggingface.co/Kwai-Klear/Klear-Reasoner-8B), along with the training data.
 
23
  **[August 11, 2025]** πŸ“’ KlearReasoner is available on [arXiv](https://arxiv.org/pdf/2508.07629) and [HuggingFace Daily](https://huggingface.co/papers/2508.07629).
24
 
25
 
26
+
27
  # ✨ Klear-Reasoner-8B
28
  We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
29