arieldeng commited on
Commit
5ca791f
·
verified ·
1 Parent(s): 2902167

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -15,17 +15,17 @@ base_model:
15
  <a href="https://huggingface.co/Kwaipilot/HIPO-8B" target="_blank">
16
  <img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor"/>
17
  </a>
18
- <a href="https://arxiv.org/abs/2507.08297" target="_blank">
19
- <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.08297-b31b1b.svg?style=for-the-badge"/>
20
  </a>
21
 
22
  <br>
23
 
24
- <a href="https://arxiv.org/abs/2507.08297"></a>
25
 
26
  </div>
27
 
28
- This work is a companion to our earlier report [**KAT-V1: Kwai-AutoThink Technical Report**](https://arxiv.org/abs/2507.08297), where we first introduced the **AutoThink paradigm** for controllable reasoning. While KAT-V1 outlined the overall framework of **SFT + RL** for adaptive reasoning, this paper provides the **detailed algorithmic design** of that training recipe.
29
 
30
  ***
31
 
@@ -114,8 +114,8 @@ print("content:\n", content)
114
  title={HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs},
115
  author={Ken Deng, Zizheng Zhan, Wen Xiang, Wenqiang Zhu and others},
116
  year={2025},
117
- institution={arXiv preprint arXiv:2507.08297},
118
- number={arXiv:2507.08297},
119
- url={https://arxiv.org/abs/2507.08297}
120
  }
121
  ```
 
15
  <a href="https://huggingface.co/Kwaipilot/HIPO-8B" target="_blank">
16
  <img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor"/>
17
  </a>
18
+ <a href="https://arxiv.org/abs/2509.23967" target="_blank">
19
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2509.23967-b31b1b.svg?style=for-the-badge"/>
20
  </a>
21
 
22
  <br>
23
 
24
+ <a href="https://arxiv.org/abs/2509.23967"></a>
25
 
26
  </div>
27
 
28
+ This work is a companion to our earlier report [**KAT-V1: Kwai-AutoThink Technical Report**](https://arxiv.org/abs/2509.23967), where we first introduced the **AutoThink paradigm** for controllable reasoning. While KAT-V1 outlined the overall framework of **SFT + RL** for adaptive reasoning, this paper provides the **detailed algorithmic design** of that training recipe.
29
 
30
  ***
31
 
 
114
  title={HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs},
115
  author={Ken Deng, Zizheng Zhan, Wen Xiang, Wenqiang Zhu and others},
116
  year={2025},
117
+ institution={arXiv preprint arXiv:2509.23967},
118
+ number={arXiv:2509.23967},
119
+ url={https://arxiv.org/abs/2509.23967}
120
  }
121
  ```