Add metadata and improve model card for OmniCaptioner
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,4 +1,8 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
<div align="center">
|
| 4 |
<h1> OmniCaptioner: One Captioner to Rule Them All </h1>
|
|
@@ -7,10 +11,11 @@
|
|
| 7 |
<div align="center">
|
| 8 |
|
| 9 |
<p align="center">
|
| 10 |
-
<a href="https://alpha-innovator.github.io/OmniCaptioner-project-page/"><b>HomePage</b></a>   |    <a href="https://github.com/Alpha-Innovator/OmniCaptioner">Github</a>   |    <a href="https://huggingface.co/papers/2504.07089">Paper</a>  
|
| 11 |
</p>
|
| 12 |
</div>
|
| 13 |
|
|
|
|
| 14 |
|
| 15 |
## ๐ป Finetuning Code
|
| 16 |
### 1. Create a conda environment and install PyTorch
|
|
@@ -77,7 +82,4 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python run.py --data MMMU_DEV_VAL --model Om
|
|
| 77 |
|
| 78 |
If you find the provided code or models useful for your research, consider citing them as:
|
| 79 |
```
|
| 80 |
-
|
| 81 |
-
```
|
| 82 |
-
|
| 83 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-to-text
|
| 3 |
+
library_name: transformers
|
| 4 |
+
license: mit # Please verify and correct if needed
|
| 5 |
+
---
|
| 6 |
|
| 7 |
<div align="center">
|
| 8 |
<h1> OmniCaptioner: One Captioner to Rule Them All </h1>
|
|
|
|
| 11 |
<div align="center">
|
| 12 |
|
| 13 |
<p align="center">
|
| 14 |
+
๐ <a href="https://alpha-innovator.github.io/OmniCaptioner-project-page/"><b>HomePage</b></a>   |   ๐ค <a href="https://github.com/Alpha-Innovator/OmniCaptioner">Github</a>   |   ๐ <a href="https://huggingface.co/papers/2504.07089">Paper</a>  
|
| 15 |
</p>
|
| 16 |
</div>
|
| 17 |
|
| 18 |
+
OmniCaptioner is a versatile visual captioning framework for generating detailed textual descriptions of various visual domains, including natural images, visual text (posters, UIs, textbooks), and structured visuals (documents, tables, charts). By converting low-level pixel information into semantically rich textual representations, this framework bridges the gap between visual and textual modalities.
|
| 19 |
|
| 20 |
## ๐ป Finetuning Code
|
| 21 |
### 1. Create a conda environment and install PyTorch
|
|
|
|
| 82 |
|
| 83 |
If you find the provided code or models useful for your research, consider citing them as:
|
| 84 |
```
|
| 85 |
+
```
|
|
|
|
|
|
|
|
|