nielsr HF Staff commited on
Commit
74581c1
Β·
verified Β·
1 Parent(s): bd38dcf

Improve model card: Add pipeline tag, library name, code link, and update image paths

Browse files

This PR enhances the model card for `CapRL-Eval-3B` by:
- Adding `pipeline_tag: image-text-to-text` to improve discoverability on the Hub.
- Specifying `library_name: transformers` as the model is compatible with the πŸ€— Transformers library, enabling an automated "How to use" widget.
- Consolidating the paper links and explicitly adding a link to the GitHub repository for easy access to the code.
- Updating relative image paths to absolute Hugging Face Hub paths for improved rendering robustness.

Please review and merge this PR if everything looks good.

Files changed (1) hide show
  1. README.md +14 -9
README.md CHANGED
@@ -1,8 +1,13 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
- πŸ“–<a href="https://arxiv.org/abs/2509.22647">Paper</a> |πŸ€—<a href="https://huggingface.co/internlm/CapRL-3B">CapRL-3B Model</a> |
5
- πŸ€—<a href="https://huggingface.co/datasets/internlm/CapRL-2M">CapRL-2M Dataset</a> |πŸ€—<a href="https://huggingface.co/collections/long-xing1/caprl-68d64ac32ded31596c36e189">CapRL Collection</a> | πŸ€—<a href="https://huggingface.co/papers/2509.22647">Daily Paper</a>
 
 
 
6
 
7
 
8
  **CapRL-Eval-3B** is the model used for answering questions based on captions, and it is a finetuned version of Qwen2.5-VL-3B. When dealing with tasks such as ChartQA (not multiple-choice questions), it provides more stable output formatting.
@@ -25,10 +30,10 @@ By employing CapRL training framework, initializing with the Qwen2.5-VL-3B model
25
  filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
26
 
27
  <p align="center">
28
- <img src="./assets/teaser.png" alt="Main Results on GPT2" width="750"/>
29
  </p>
30
  <p align="center">
31
- <img src="./assets/performance.png" alt="Main Results on GPT2" width="750"/>
32
  </p>
33
 
34
  ## Key Features
@@ -105,16 +110,16 @@ print("Chat response:", chat_response)
105
 
106
  ## Cases
107
  <p align="center">
108
- <img src="./assets/comparison.png" alt="Main Results on GPT2" width="750"/>
109
  </p>
110
 
111
  <p align="center">
112
- <img src="./assets/info_caprl.png" alt="Main Results on GPT2" width="750"/>
113
  </p>
114
 
115
  <p align="center">
116
- <img src="./assets/info_caprl2.png" alt="Main Results on GPT2" width="750"/>
117
  </p>
118
  <p align="center">
119
- <img src="./assets/natural_caprl.png" alt="Main Results on GPT2" width="750"/>
120
- </p>
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
  ---
6
+
7
+ # CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
8
+
9
+ πŸ“–<a href="https://huggingface.co/papers/2509.22647">Paper</a> | πŸ’»<a href="https://github.com/InternLM/CapRL">Code</a> | πŸ€—<a href="https://huggingface.co/internlm/CapRL-3B">CapRL-3B Model</a> |
10
+ πŸ€—<a href="https://huggingface.co/datasets/internlm/CapRL-2M">CapRL-2M Dataset</a> |πŸ€—<a href="https://huggingface.co/collections/long-xing1/caprl-68d64ac32ded31596c36e189">CapRL Collection</a>
11
 
12
 
13
  **CapRL-Eval-3B** is the model used for answering questions based on captions, and it is a finetuned version of Qwen2.5-VL-3B. When dealing with tasks such as ChartQA (not multiple-choice questions), it provides more stable output formatting.
 
30
  filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
31
 
32
  <p align="center">
33
+ <img src="https://huggingface.co/internlm/CapRL-Eval-3B/resolve/main/assets/teaser.png" alt="Main Results on GPT2" width="750"/>
34
  </p>
35
  <p align="center">
36
+ <img src="https://huggingface.co/internlm/CapRL-Eval-3B/resolve/main/assets/performance.png" alt="Main Results on GPT2" width="750"/>
37
  </p>
38
 
39
  ## Key Features
 
110
 
111
  ## Cases
112
  <p align="center">
113
+ <img src="https://huggingface.co/internlm/CapRL-Eval-3B/resolve/main/assets/comparison.png" alt="Main Results on GPT2" width="750"/>
114
  </p>
115
 
116
  <p align="center">
117
+ <img src="https://huggingface.co/internlm/CapRL-Eval-3B/resolve/main/assets/info_caprl.png" alt="Main Results on GPT2" width="750"/>
118
  </p>
119
 
120
  <p align="center">
121
+ <img src="https://huggingface.co/internlm/CapRL-Eval-3B/resolve/main/assets/info_caprl2.png" alt="Main Results on GPT2" width="750"/>
122
  </p>
123
  <p align="center">
124
+ <img src="https://huggingface.co/internlm/CapRL-Eval-3B/resolve/main/assets/natural_caprl.png" alt="Main Results on GPT2" width="750"/>
125
+ </p>