Update README.md
Browse files
README.md
CHANGED
|
@@ -15,17 +15,17 @@ base_model:
|
|
| 15 |
<a href="https://huggingface.co/Kwaipilot/HIPO-8B" target="_blank">
|
| 16 |
<img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor"/>
|
| 17 |
</a>
|
| 18 |
-
<a href="https://arxiv.org/abs/
|
| 19 |
-
<img alt="arXiv" src="https://img.shields.io/badge/arXiv-
|
| 20 |
</a>
|
| 21 |
|
| 22 |
<br>
|
| 23 |
|
| 24 |
-
<a href="https://arxiv.org/abs/
|
| 25 |
|
| 26 |
</div>
|
| 27 |
|
| 28 |
-
This work is a companion to our earlier report [**KAT-V1: Kwai-AutoThink Technical Report**](https://arxiv.org/abs/
|
| 29 |
|
| 30 |
***
|
| 31 |
|
|
@@ -114,8 +114,8 @@ print("content:\n", content)
|
|
| 114 |
title={HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs},
|
| 115 |
author={Ken Deng, Zizheng Zhan, Wen Xiang, Wenqiang Zhu and others},
|
| 116 |
year={2025},
|
| 117 |
-
institution={arXiv preprint arXiv:
|
| 118 |
-
number={arXiv:
|
| 119 |
-
url={https://arxiv.org/abs/
|
| 120 |
}
|
| 121 |
```
|
|
|
|
| 15 |
<a href="https://huggingface.co/Kwaipilot/HIPO-8B" target="_blank">
|
| 16 |
<img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor"/>
|
| 17 |
</a>
|
| 18 |
+
<a href="https://arxiv.org/abs/2509.23967" target="_blank">
|
| 19 |
+
<img alt="arXiv" src="https://img.shields.io/badge/arXiv-2509.23967-b31b1b.svg?style=for-the-badge"/>
|
| 20 |
</a>
|
| 21 |
|
| 22 |
<br>
|
| 23 |
|
| 24 |
+
<a href="https://arxiv.org/abs/2509.23967"></a>
|
| 25 |
|
| 26 |
</div>
|
| 27 |
|
| 28 |
+
This work is a companion to our earlier report [**KAT-V1: Kwai-AutoThink Technical Report**](https://arxiv.org/abs/2509.23967), where we first introduced the **AutoThink paradigm** for controllable reasoning. While KAT-V1 outlined the overall framework of **SFT + RL** for adaptive reasoning, this paper provides the **detailed algorithmic design** of that training recipe.
|
| 29 |
|
| 30 |
***
|
| 31 |
|
|
|
|
| 114 |
title={HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs},
|
| 115 |
author={Ken Deng, Zizheng Zhan, Wen Xiang, Wenqiang Zhu and others},
|
| 116 |
year={2025},
|
| 117 |
+
institution={arXiv preprint arXiv:2509.23967},
|
| 118 |
+
number={arXiv:2509.23967},
|
| 119 |
+
url={https://arxiv.org/abs/2509.23967}
|
| 120 |
}
|
| 121 |
```
|