Update paper link and model card title to Kwai Keye-VL 1.5 Technical Report

#7
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -8,13 +8,13 @@ tags:
8
  - multimodal
9
  ---
10
 
11
- # Kwai Keye-VL
12
 
13
  <div align="center">
14
  <img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
15
  </div>
16
 
17
- <font size=3><div align='center' > [[🍎 Home Page](https://kwai-keye.github.io/)] [[πŸ“– Technical Report](https://huggingface.co/papers/2507.01949)] [[πŸ“Š Models](https://huggingface.co/Kwai-Keye)] [[πŸš€ Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)] [[πŸ’» Code](https://github.com/Kwai-Keye/Keye)] </div></font>
18
 
19
  ## Abstract
20
 
@@ -461,7 +461,7 @@ The post-training phase of Kwai Keye is meticulously designed into two phases wi
461
  - Training Strategy: Uses a mix-mode GRPO algorithm for reinforcement learning, where reward signals evaluate both the correctness of results and the consistency of the process and results, ensuring synchronized optimization of reasoning processes and final outcomes.
462
  - **Step II.2: Iterative Alignment**
463
  - Objective: Address common issues like repetitive crashes and poor logic in model-generated content, and enable spontaneous reasoning mode selection to enhance final performance and stability.
464
- - Data Composition: Constructs preference data through Rejection Fine-Tuning (RFT), combining rule-based scoring (judging repetition, instruction following, etc.) and model scoring (cognitive scores provided by large models) to rank various model responses, building a high-quality preference dataset.
465
  - Training Strategy: Multi-round iterative optimization with the constructed "good/bad" preference data pairs through the MPO algorithm. This aims to correct model generation flaws and ultimately enable it to intelligently and adaptively choose whether to activate deep reasoning modes based on problem complexity.
466
 
467
  </details>
@@ -479,14 +479,14 @@ The post-training phase of Kwai Keye is meticulously designed into two phases wi
479
  If you find our work helpful for your research, please consider citing our work.
480
 
481
  ```bibtex
482
- @misc{kwaikeyeteam2025kwaikeyevltechnicalreport,
483
- title={Kwai Keye-VL Technical Report},
484
  author={Kwai Keye Team},
485
  year={2025},
486
- eprint={2507.01949},
487
  archivePrefix={arXiv},
488
  primaryClass={cs.CV},
489
- url={https://arxiv.org/abs/2507.01949},
490
  }
491
  ```
492
 
 
8
  - multimodal
9
  ---
10
 
11
+ # Kwai Keye-VL 1.5
12
 
13
  <div align="center">
14
  <img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
15
  </div>
16
 
17
+ <font size=3><div align='center' > [[🍎 Home Page](https://kwai-keye.github.io/)] [[πŸ“– Technical Report](https://huggingface.co/papers/2509.01563)] [[πŸ“Š Models](https://huggingface.co/Kwai-Keye)] [[πŸš€ Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)] [[πŸ’» Code](https://github.com/Kwai-Keye/Keye)] </div></font>
18
 
19
  ## Abstract
20
 
 
461
  - Training Strategy: Uses a mix-mode GRPO algorithm for reinforcement learning, where reward signals evaluate both the correctness of results and the consistency of the process and results, ensuring synchronized optimization of reasoning processes and final outcomes.
462
  - **Step II.2: Iterative Alignment**
463
  - Objective: Address common issues like repetitive crashes and poor logic in model-generated content, and enable spontaneous reasoning mode selection to enhance final performance and stability.
464
+ - Data Composition: Constructs preference data through Rejection Fine-Tuning (RFT), combining rule-based scoring (judging repetition, instruction following, etc.) and model scoring (cognitive scores provided by large models) and ranking various model responses, building a high-quality preference dataset.
465
  - Training Strategy: Multi-round iterative optimization with the constructed "good/bad" preference data pairs through the MPO algorithm. This aims to correct model generation flaws and ultimately enable it to intelligently and adaptively choose whether to activate deep reasoning modes based on problem complexity.
466
 
467
  </details>
 
479
  If you find our work helpful for your research, please consider citing our work.
480
 
481
  ```bibtex
482
+ @misc{kwaikeyeteam2025kwaikeyevl15technicalreport,
483
+ title={Kwai Keye-VL 1.5 Technical Report},
484
  author={Kwai Keye Team},
485
  year={2025},
486
+ eprint={2509.01563},
487
  archivePrefix={arXiv},
488
  primaryClass={cs.CV},
489
+ url={https://arxiv.org/abs/2509.01563},
490
  }
491
  ```
492