Kwaipilot
/

HiPO-8B

@@ -15,17 +15,17 @@ base_model:
 <a href="https://huggingface.co/Kwaipilot/HIPO-8B" target="_blank">
   <img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor"/>
 </a>
-<a href="https://arxiv.org/abs/2507.08297" target="_blank">
-  <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.08297-b31b1b.svg?style=for-the-badge"/>
 </a>
 <br>
-<a href="https://arxiv.org/abs/2507.08297"></a>
 </div>
-This work is a companion to our earlier report [**KAT-V1: Kwai-AutoThink Technical Report**](https://arxiv.org/abs/2507.08297), where we first introduced the **AutoThink paradigm** for controllable reasoning. While KAT-V1 outlined the overall framework of **SFT + RL** for adaptive reasoning, this paper provides the **detailed algorithmic design** of that training recipe.
 ***
@@ -114,8 +114,8 @@ print("content:\n", content)
   title={HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs},
   author={Ken Deng, Zizheng Zhan, Wen Xiang, Wenqiang Zhu and others},
   year={2025},
-  institution={arXiv preprint arXiv:2507.08297},
-  number={arXiv:2507.08297},
-  url={https://arxiv.org/abs/2507.08297}
 }
 ```

 <a href="https://huggingface.co/Kwaipilot/HIPO-8B" target="_blank">
   <img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor"/>
 </a>
+<a href="https://arxiv.org/abs/2509.23967" target="_blank">
+  <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2509.23967-b31b1b.svg?style=for-the-badge"/>
 </a>
 <br>
+<a href="https://arxiv.org/abs/2509.23967"></a>
 </div>
+This work is a companion to our earlier report [**KAT-V1: Kwai-AutoThink Technical Report**](https://arxiv.org/abs/2509.23967), where we first introduced the **AutoThink paradigm** for controllable reasoning. While KAT-V1 outlined the overall framework of **SFT + RL** for adaptive reasoning, this paper provides the **detailed algorithmic design** of that training recipe.
 ***
   title={HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs},
   author={Ken Deng, Zizheng Zhan, Wen Xiang, Wenqiang Zhu and others},
   year={2025},
+  institution={arXiv preprint arXiv:2509.23967},
+  number={arXiv:2509.23967},
+  url={https://arxiv.org/abs/2509.23967}
 }
 ```