U4R
/

Sana_trainwithQwen2VLInstruct

Add metadata and improve model card for OmniCaptioner

by nielsr HF Staff - opened Apr 13, 2025

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,4 +1,8 @@
 <div align="center">
 <h1> OmniCaptioner: One Captioner to Rule Them All </h1>
@@ -7,10 +11,11 @@
 <div align="center">
 <p align="center">
- <a href="https://alpha-innovator.github.io/OmniCaptioner-project-page/"><b>HomePage</b></a>&nbsp&nbsp | &nbsp&nbsp <a href="https://github.com/Alpha-Innovator/OmniCaptioner">Github</a>&nbsp&nbsp | &nbsp&nbsp <a href="https://huggingface.co/papers/2504.07089">Paper</a>&nbsp&nbsp
 </p>
 </div>
 ## 💻 Finetuning Code
 ### 1. Create a conda environment and install PyTorch
@@ -77,7 +82,4 @@ CUDA_VISIBLE_DEVICES=0,1,2,3  nohup python run.py --data MMMU_DEV_VAL --model Om
 If you find the provided code or models useful for your research, consider citing them as:
 ```
-```

+---
+pipeline_tag: image-to-text
+library_name: transformers
+license: mit # Please verify and correct if needed
+---
 <div align="center">
 <h1> OmniCaptioner: One Captioner to Rule Them All </h1>
 <div align="center">
 <p align="center">
+💜 <a href="https://alpha-innovator.github.io/OmniCaptioner-project-page/"><b>HomePage</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://github.com/Alpha-Innovator/OmniCaptioner">Github</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="https://huggingface.co/papers/2504.07089">Paper</a>&nbsp&nbsp
 </p>
 </div>
+OmniCaptioner is a versatile visual captioning framework for generating detailed textual descriptions of various visual domains, including natural images, visual text (posters, UIs, textbooks), and structured visuals (documents, tables, charts). By converting low-level pixel information into semantically rich textual representations, this framework bridges the gap between visual and textual modalities.
 ## 💻 Finetuning Code
 ### 1. Create a conda environment and install PyTorch
 If you find the provided code or models useful for your research, consider citing them as:
 ```
+```