yuhangzang commited on
Commit
cdaad41
·
verified ·
1 Parent(s): e25de6e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -23,4 +23,12 @@ stage uses LVLMs to generate rich and accurate captions. Subsequently, the secon
23
  caption quality by using a vision-only LLM to perform the QA task. We also created a specific QA
24
  curation pipeline to ensure the quality of the questions and answers used for the second stage.
25
 
26
- By employing CapRL training framework, initializing with the Qwen2.5-VL-3B model, and using a carefully filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
 
 
 
 
 
 
 
 
 
23
  caption quality by using a vision-only LLM to perform the QA task. We also created a specific QA
24
  curation pipeline to ensure the quality of the questions and answers used for the second stage.
25
 
26
+ By employing CapRL training framework, initializing with the Qwen2.5-VL-3B model, and using a carefully
27
+ filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
28
+
29
+ ## Key Features
30
+ * **Remarkable visual understanding for Chart, Infographics and Document**: CapRL-3B achieves perception accuracy and visual information coverage comparable to Qwen2.5-VL-72B.
31
+ * **Well-organized output**: The outputs of CapRL-3B are relatively well-structured, making them clear and easy to understand.
32
+ * **Detailed description for natural images**: The outputs of CapRL-3B can perfectly cover all valid visual information while containing fewer hallucinations.
33
+
34
+ ## Cases