internlm
/

CapRL-3B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

yuhangzang commited on Oct 16, 2025

Commit

30c515b

·

verified ·

1 Parent(s): 72e3579

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -16,6 +16,9 @@ tags:
   🤗<a href="https://huggingface.co/collections/long-xing1/caprl-68d64ac32ded31596c36e189">CapRL Collection</a> | 🤗<a href="https://huggingface.co/papers/2509.22647">Daily Paper</a>
 ## Introduction
@@ -34,10 +37,10 @@ By employing CapRL training framework, initializing with the Qwen2.5-VL-3B model
 filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
 <p align="center">
-  <img src="./assets/teaser.png" alt="Main Results on GPT2" width="750"/>
 </p>
 <p align="center">
-  <img src="./assets/performance_update.png" alt="Main Results on GPT2" width="750"/>
 </p>
 ## Key Features

   🤗<a href="https://huggingface.co/collections/long-xing1/caprl-68d64ac32ded31596c36e189">CapRL Collection</a> | 🤗<a href="https://huggingface.co/papers/2509.22647">Daily Paper</a>
+Based on the same recipe as CapRL-3B, we used InternVL3.5-8B as the policy model and obtained **CapRL-InternVL3.5-8B** through CapRL. **Its performance significantly surpasses that of Qwen2.5-VL-72B**.
+We are working on even stronger base models and upgrading our training recipe — stay tuned!
 ## Introduction
 filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
 <p align="center">
+  <img src="./assets/teaser.png"  width="750"/>
 </p>
 <p align="center">
+  <img src="./assets/performance_update.png" width="750"/>
 </p>
 ## Key Features