nielsr HF Staff commited on
Commit
f9283b9
Β·
verified Β·
1 Parent(s): 749da5d

Update paper link and model card title to Kwai Keye-VL 1.5 Technical Report

Browse files

This PR updates the model card to reflect the [Kwai Keye-VL 1.5 Technical Report](https://huggingface.co/papers/2509.01563). This includes updating the main title, the technical report link in the header, and the citation information to align with the specified paper.

Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -8,13 +8,13 @@ tags:
8
  - multimodal
9
  ---
10
 
11
- # Kwai Keye-VL
12
 
13
  <div align="center">
14
  <img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
15
  </div>
16
 
17
- <font size=3><div align='center' > [[🍎 Home Page](https://kwai-keye.github.io/)] [[πŸ“– Technical Report](https://huggingface.co/papers/2507.01949)] [[πŸ“Š Models](https://huggingface.co/Kwai-Keye)] [[πŸš€ Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)] [[πŸ’» Code](https://github.com/Kwai-Keye/Keye)] </div></font>
18
 
19
  ## Abstract
20
 
@@ -461,7 +461,7 @@ The post-training phase of Kwai Keye is meticulously designed into two phases wi
461
  - Training Strategy: Uses a mix-mode GRPO algorithm for reinforcement learning, where reward signals evaluate both the correctness of results and the consistency of the process and results, ensuring synchronized optimization of reasoning processes and final outcomes.
462
  - **Step II.2: Iterative Alignment**
463
  - Objective: Address common issues like repetitive crashes and poor logic in model-generated content, and enable spontaneous reasoning mode selection to enhance final performance and stability.
464
- - Data Composition: Constructs preference data through Rejection Fine-Tuning (RFT), combining rule-based scoring (judging repetition, instruction following, etc.) and model scoring (cognitive scores provided by large models) to rank various model responses, building a high-quality preference dataset.
465
  - Training Strategy: Multi-round iterative optimization with the constructed "good/bad" preference data pairs through the MPO algorithm. This aims to correct model generation flaws and ultimately enable it to intelligently and adaptively choose whether to activate deep reasoning modes based on problem complexity.
466
 
467
  </details>
@@ -479,14 +479,14 @@ The post-training phase of Kwai Keye is meticulously designed into two phases wi
479
  If you find our work helpful for your research, please consider citing our work.
480
 
481
  ```bibtex
482
- @misc{kwaikeyeteam2025kwaikeyevltechnicalreport,
483
- title={Kwai Keye-VL Technical Report},
484
  author={Kwai Keye Team},
485
  year={2025},
486
- eprint={2507.01949},
487
  archivePrefix={arXiv},
488
  primaryClass={cs.CV},
489
- url={https://arxiv.org/abs/2507.01949},
490
  }
491
  ```
492
 
 
8
  - multimodal
9
  ---
10
 
11
+ # Kwai Keye-VL 1.5
12
 
13
  <div align="center">
14
  <img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
15
  </div>
16
 
17
+ <font size=3><div align='center' > [[🍎 Home Page](https://kwai-keye.github.io/)] [[πŸ“– Technical Report](https://huggingface.co/papers/2509.01563)] [[πŸ“Š Models](https://huggingface.co/Kwai-Keye)] [[πŸš€ Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)] [[πŸ’» Code](https://github.com/Kwai-Keye/Keye)] </div></font>
18
 
19
  ## Abstract
20
 
 
461
  - Training Strategy: Uses a mix-mode GRPO algorithm for reinforcement learning, where reward signals evaluate both the correctness of results and the consistency of the process and results, ensuring synchronized optimization of reasoning processes and final outcomes.
462
  - **Step II.2: Iterative Alignment**
463
  - Objective: Address common issues like repetitive crashes and poor logic in model-generated content, and enable spontaneous reasoning mode selection to enhance final performance and stability.
464
+ - Data Composition: Constructs preference data through Rejection Fine-Tuning (RFT), combining rule-based scoring (judging repetition, instruction following, etc.) and model scoring (cognitive scores provided by large models) and ranking various model responses, building a high-quality preference dataset.
465
  - Training Strategy: Multi-round iterative optimization with the constructed "good/bad" preference data pairs through the MPO algorithm. This aims to correct model generation flaws and ultimately enable it to intelligently and adaptively choose whether to activate deep reasoning modes based on problem complexity.
466
 
467
  </details>
 
479
  If you find our work helpful for your research, please consider citing our work.
480
 
481
  ```bibtex
482
+ @misc{kwaikeyeteam2025kwaikeyevl15technicalreport,
483
+ title={Kwai Keye-VL 1.5 Technical Report},
484
  author={Kwai Keye Team},
485
  year={2025},
486
+ eprint={2509.01563},
487
  archivePrefix={arXiv},
488
  primaryClass={cs.CV},
489
+ url={https://arxiv.org/abs/2509.01563},
490
  }
491
  ```
492