Update README.md
Browse files
README.md
CHANGED
|
@@ -33,22 +33,30 @@ license: apache-2.0
|
|
| 33 |
·
|
| 34 |
<a href="https://scholar.google.com/citations?user=sJkqsqkAAAAJ"><strong>Yuhang Cao</strong></a>
|
| 35 |
·
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
<a href="https://scholar.google.com/citations?user=P4yNnSkAAAAJ&hl=zh-TW"><strong>Jianze Liang</strong></a>
|
| 37 |
·
|
| 38 |
<a href="https://github.com/shikiw"><strong>Qidong Huang</strong></a>
|
| 39 |
·
|
| 40 |
<a href="https://myownskyw7.github.io/"><strong>Jiaqi Wang</strong></a>
|
| 41 |
·
|
|
|
|
|
|
|
| 42 |
<a href="http://dahua.site/"><strong>Dahua Lin</strong></a>
|
| 43 |
</p>
|
| 44 |
|
| 45 |
-
📖<a href="https://arxiv.org/abs/
|
| 46 |
|
| 47 |
|
| 48 |
#### CapRL Series Model & Dataset
|
| 49 |
| Series | Models & Resources |
|
| 50 |
| :--- | :--- |
|
| 51 |
-
| **CapRL 3.0 Series (CapRL++)** | [🤗 CapRL-Video-4B](https://huggingface.co/internlm/CapRL-Video-4B) \|[📊 CapRL-Video-178K Dataset](https://huggingface.co/datasets/internlm/CapRL-Video-178K)
|
| 52 |
| **CapRL 2.0 Series** | [🤗 CapRL-Qwen3VL-2B](https://huggingface.co/internlm/CapRL-Qwen3VL-2B) \| [🤗 CapRL-Qwen3VL-4B](https://huggingface.co/internlm/CapRL-Qwen3VL-4B) \| [📦 CapRL-Qwen3VL-2B-GGUF](https://huggingface.co/internlm/CapRL-Qwen3VL-2B-GGUF) \| [📦 CapRL-Qwen3VL-4B-GGUF](https://huggingface.co/internlm/CapRL-Qwen3VL-4B-GGUF) \| [🌈CapRL-Qwen3VL-4B Space](https://huggingface.co/spaces/yuhangzang/CapRL-Qwen3VL-4B)
|
| 53 |
| **CapRL 1.0 Series** | [🤗 CapRL-Qwen2.5VL-3B](https://huggingface.co/internlm/CapRL-3B) \| [🤗 CapRL-InternVL3.5-8B](https://huggingface.co/yuhangzang/CapRL-InternVL3.5-8B) \| [📊 CapRL-2M Dataset](https://huggingface.co/datasets/internlm/CapRL-2M) \| [📦 CapRL-3B-GGUF](https://huggingface.co/mradermacher/CapRL-3B-GGUF) \| [📦 CapRL-3B-i1-GGUF](https://huggingface.co/mradermacher/CapRL-3B-i1-GGUF) \| [🌈CapRL-Qwen2.5VL-3B Space](https://huggingface.co/spaces/yuhangzang/caprl)
|
| 54 |
|
|
@@ -72,10 +80,11 @@ Now you can try out CapRL with your own images🎨! ➡
|
|
| 72 |
|
| 73 |
## 📢 News
|
| 74 |
We are working on even stronger base models and upgrading our training recipe — stay tuned!
|
|
|
|
| 75 |
- 🔥 [05/25/2026] We have released the training and evaluation code for CapRL++. See more in `CapRL++` folder.
|
| 76 |
- 🔥 [05/22/2026] We have released the **[CapRL-Video-QA-20K](https://huggingface.co/datasets/internlm/CapRL-Video-QA-20K)** dataset for CapRL++ training and
|
| 77 |
the **[CapRL-Video-178K](https://huggingface.co/datasets/internlm/CapRL-Video-178K)** dataset (recaptioned by **[CapRL-Video-4B](https://huggingface.co/internlm/CapRL-Video-4B)** from LLaVA-Video-178K)!
|
| 78 |
-
- 🔥 [05/22/2026]
|
| 79 |
- 🔥 [04/16/2026] We have released the **[CapRL-QA-75K](https://huggingface.co/datasets/internlm/CapRL-QA-75K)** training dataset!
|
| 80 |
- 🔥 [2/9/2026] We release the CapRL training code.
|
| 81 |
- 🔥 [1/27/2026] CapRL is accepted by ICLR2026! We are working on cleaning training code, and will release everything as soon as possible!
|
|
@@ -97,6 +106,9 @@ The original **CapRL** framework focuses on dense image captioning. It optimizes
|
|
| 97 |
|
| 98 |
|
| 99 |
|
|
|
|
|
|
|
|
|
|
| 100 |
## 💡 Highlights
|
| 101 |
- 🔥 **Unified dense caption RL for images and videos**: CapRL++ applies the same QA-utility reward philosophy to both image and video captioning, avoiding dependence on a single reference caption.
|
| 102 |
- 🔥 **Verifiable reward design**: CapRL++ combines visual utility reward, timestamp-format reward, and length-aware penalty to optimize accuracy, temporal structure, and information efficiency.
|
|
@@ -376,6 +388,30 @@ You can specify `--reward-model-path` as the path to **CapRL-Eval-3B** in `Eval_
|
|
| 376 |
**Usage and License Notices**: The data and code are intended and licensed for research use only.
|
| 377 |
License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use
|
| 378 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 379 |
## ❤️ Acknowledgments
|
| 380 |
- [Open-LLaVA-NeXT](https://github.com/xiaoachen98/Open-LLaVA-NeXT): Thanks for the impressive open-source dataset.
|
| 381 |
- [VLMEvalKit](https://github.com/open-compass/VLMEvalKit): the amazing open-sourced suit for evaluating various LMMs!
|
|
|
|
| 33 |
·
|
| 34 |
<a href="https://scholar.google.com/citations?user=sJkqsqkAAAAJ"><strong>Yuhang Cao</strong></a>
|
| 35 |
·
|
| 36 |
+
<a href="https://codegoat24.github.io/"><strong>Yibin Wang</strong></a>
|
| 37 |
+
·
|
| 38 |
+
<a href="https://github.com/YujieOuO/"><strong>Yujie Zhou</strong></a>
|
| 39 |
+
·
|
| 40 |
+
<a href="https://bujiazi.github.io/"><strong>Jiazi Bu</strong></a>
|
| 41 |
+
·
|
| 42 |
<a href="https://scholar.google.com/citations?user=P4yNnSkAAAAJ&hl=zh-TW"><strong>Jianze Liang</strong></a>
|
| 43 |
·
|
| 44 |
<a href="https://github.com/shikiw"><strong>Qidong Huang</strong></a>
|
| 45 |
·
|
| 46 |
<a href="https://myownskyw7.github.io/"><strong>Jiaqi Wang</strong></a>
|
| 47 |
·
|
| 48 |
+
<a href="https://scholar.google.com/citations?user=5bInRDEAAAAJ&hl=zh-CN"><strong>Feng Wu</strong></a>
|
| 49 |
+
·
|
| 50 |
<a href="http://dahua.site/"><strong>Dahua Lin</strong></a>
|
| 51 |
</p>
|
| 52 |
|
| 53 |
+
📖<a href="https://arxiv.org/abs/2606.09393">CapRL++ Paper</a> | 📖<a href="https://arxiv.org/abs/2509.22647">CapRL Paper</a> | 🏠<a href="https://github.com/InternLM/CapRL">Github</a> | 🤗<a href="https://huggingface.co/collections/long-xing1/caprl-68d64ac32ded31596c36e189">CapRL Collection</a>
|
| 54 |
|
| 55 |
|
| 56 |
#### CapRL Series Model & Dataset
|
| 57 |
| Series | Models & Resources |
|
| 58 |
| :--- | :--- |
|
| 59 |
+
| **CapRL 3.0 Series (CapRL++)** | [🤗 CapRL-Video-4B](https://huggingface.co/internlm/CapRL-Video-4B) \| [📊 CapRL-Video-178K Dataset](https://huggingface.co/datasets/internlm/CapRL-Video-178K) |
|
| 60 |
| **CapRL 2.0 Series** | [🤗 CapRL-Qwen3VL-2B](https://huggingface.co/internlm/CapRL-Qwen3VL-2B) \| [🤗 CapRL-Qwen3VL-4B](https://huggingface.co/internlm/CapRL-Qwen3VL-4B) \| [📦 CapRL-Qwen3VL-2B-GGUF](https://huggingface.co/internlm/CapRL-Qwen3VL-2B-GGUF) \| [📦 CapRL-Qwen3VL-4B-GGUF](https://huggingface.co/internlm/CapRL-Qwen3VL-4B-GGUF) \| [🌈CapRL-Qwen3VL-4B Space](https://huggingface.co/spaces/yuhangzang/CapRL-Qwen3VL-4B)
|
| 61 |
| **CapRL 1.0 Series** | [🤗 CapRL-Qwen2.5VL-3B](https://huggingface.co/internlm/CapRL-3B) \| [🤗 CapRL-InternVL3.5-8B](https://huggingface.co/yuhangzang/CapRL-InternVL3.5-8B) \| [📊 CapRL-2M Dataset](https://huggingface.co/datasets/internlm/CapRL-2M) \| [📦 CapRL-3B-GGUF](https://huggingface.co/mradermacher/CapRL-3B-GGUF) \| [📦 CapRL-3B-i1-GGUF](https://huggingface.co/mradermacher/CapRL-3B-i1-GGUF) \| [🌈CapRL-Qwen2.5VL-3B Space](https://huggingface.co/spaces/yuhangzang/caprl)
|
| 62 |
|
|
|
|
| 80 |
|
| 81 |
## 📢 News
|
| 82 |
We are working on even stronger base models and upgrading our training recipe — stay tuned!
|
| 83 |
+
- 🔥 [06/08/2026] **CapRL++** paper is available on arXiv: [CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning](https://arxiv.org/abs/2606.09393).
|
| 84 |
- 🔥 [05/25/2026] We have released the training and evaluation code for CapRL++. See more in `CapRL++` folder.
|
| 85 |
- 🔥 [05/22/2026] We have released the **[CapRL-Video-QA-20K](https://huggingface.co/datasets/internlm/CapRL-Video-QA-20K)** dataset for CapRL++ training and
|
| 86 |
the **[CapRL-Video-178K](https://huggingface.co/datasets/internlm/CapRL-Video-178K)** dataset (recaptioned by **[CapRL-Video-4B](https://huggingface.co/internlm/CapRL-Video-4B)** from LLaVA-Video-178K)!
|
| 87 |
+
- 🔥 [05/22/2026] We have released the **[CapRL-Video-4B](https://huggingface.co/internlm/CapRL-Video-4B)** model (trained on Qwen3-VL-4B) designed for video captioning! Demo is [here](https://internlm.github.io/CapRL/demo/).
|
| 88 |
- 🔥 [04/16/2026] We have released the **[CapRL-QA-75K](https://huggingface.co/datasets/internlm/CapRL-QA-75K)** training dataset!
|
| 89 |
- 🔥 [2/9/2026] We release the CapRL training code.
|
| 90 |
- 🔥 [1/27/2026] CapRL is accepted by ICLR2026! We are working on cleaning training code, and will release everything as soon as possible!
|
|
|
|
| 106 |
|
| 107 |
|
| 108 |
|
| 109 |
+
</p>
|
| 110 |
+
|
| 111 |
+
|
| 112 |
## 💡 Highlights
|
| 113 |
- 🔥 **Unified dense caption RL for images and videos**: CapRL++ applies the same QA-utility reward philosophy to both image and video captioning, avoiding dependence on a single reference caption.
|
| 114 |
- 🔥 **Verifiable reward design**: CapRL++ combines visual utility reward, timestamp-format reward, and length-aware penalty to optimize accuracy, temporal structure, and information efficiency.
|
|
|
|
| 388 |
**Usage and License Notices**: The data and code are intended and licensed for research use only.
|
| 389 |
License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use
|
| 390 |
|
| 391 |
+
## Citation
|
| 392 |
+
|
| 393 |
+
If you find CapRL++ useful for your research, please consider citing:
|
| 394 |
+
|
| 395 |
+
```bibtex
|
| 396 |
+
@article{yang2026caprlplusplus,
|
| 397 |
+
title={CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning},
|
| 398 |
+
author={Yang, Penghui and Xing, Long and Dong, Xiaoyi and Zang, Yuhang and Cao, Yuhang and Wang, Yibin and Zhou, Yujie and Bu, Jiazi and Liang, Jianze and Huang, Qidong and Wang, Jiaqi and Wu, Feng and Lin, Dahua},
|
| 399 |
+
journal={arXiv preprint arXiv:2606.09393},
|
| 400 |
+
year={2026}
|
| 401 |
+
}
|
| 402 |
+
```
|
| 403 |
+
|
| 404 |
+
For the original CapRL paper:
|
| 405 |
+
|
| 406 |
+
```bibtex
|
| 407 |
+
@article{xing2025caprl,
|
| 408 |
+
title={CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning},
|
| 409 |
+
author={Xing, Long and Dong, Xiaoyi and Zang, Yuhang and Cao, Yuhang and Liang, Jianze and Huang, Qidong and Wang, Jiaqi and Wu, Feng and Lin, Dahua},
|
| 410 |
+
journal={arXiv preprint arXiv:2509.22647},
|
| 411 |
+
year={2025}
|
| 412 |
+
}
|
| 413 |
+
```
|
| 414 |
+
|
| 415 |
## ❤️ Acknowledgments
|
| 416 |
- [Open-LLaVA-NeXT](https://github.com/xiaoachen98/Open-LLaVA-NeXT): Thanks for the impressive open-source dataset.
|
| 417 |
- [VLMEvalKit](https://github.com/open-compass/VLMEvalKit): the amazing open-sourced suit for evaluating various LMMs!
|