BoZhang commited on
Commit
a719b56
·
verified ·
1 Parent(s): 45a0743

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -3
README.md CHANGED
@@ -1,3 +1,92 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ <div align="center">
4
+ <h1> OmniCaptioner: One Captioner to Rule Them All </h1>
5
+
6
+ </div>
7
+ <div align="center">
8
+
9
+ <p align="center">
10
+ 💜 <a href="https://alpha-innovator.github.io/OmniCaptioner-project-page/"><b>HomePage</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/U4R/OmniCaptioner">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="https://arxiv.org/abs">Paper</a>&nbsp&nbsp
11
+ </p>
12
+ </div>
13
+
14
+
15
+ ## 📰 News
16
+
17
+
18
+
19
+ ## 📊 Quantatitive Performance
20
+ ![Quantitative Results](assets/quantitative.png)
21
+
22
+
23
+
24
+ ## 💻 Finetuning Code
25
+ ### 1. Create a conda environment and install PyTorch
26
+ ```bash
27
+ conda create -n OmniCap python=3.9
28
+ conda activate OmniCap
29
+ ```
30
+ ### 2.Install dependencies
31
+ ```bash
32
+ pip install -r requirements.txt
33
+ ```
34
+ ### 3. Install flash-attn
35
+ ```bash
36
+ pip install flash-attn --no-build-isolation
37
+ ```
38
+ ### 4. Prepare data
39
+ You can place the links to your data files in `./data/caption_data.yaml`.
40
+
41
+ ### 5. Start finetuning
42
+ ```bash
43
+ bash scripts/finetune_caption_slurm.sh
44
+ ```
45
+ ## 🚀 Inference Code
46
+
47
+ You can caption the image with AIGC style using the following command:
48
+
49
+
50
+ ```python
51
+ CUDA_VISIBLE_DEVICES=0 python src/inference_single_image.py \
52
+ --model_path your_model_path \
53
+ --image_path your_image_path \
54
+ --image_type aigc
55
+ ```
56
+
57
+ You can caption the image with OCR style using the following command:
58
+
59
+ ```python
60
+ CUDA_VISIBLE_DEVICES=0 python src/inference_single_image.py \
61
+ --model_path your_model_path \
62
+ --image_path your_image_path \
63
+ --image_type ocr
64
+ ```
65
+ ## 🚀 Evaluation Code with LLM
66
+
67
+ ```python
68
+
69
+ cd VLMEvalkit
70
+ conda create -n VLMEvalkit python=3.9
71
+ conda activate VLMEvalkit
72
+ pip install -e .
73
+
74
+ CUDA_VISIBLE_DEVICES=0 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-qwen2-5-3B --verbose > output_omnicap_qwen2-5-3B_MMMU_DEV_VAL.log 2>&1 &
75
+ CUDA_VISIBLE_DEVICES=0,1 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-qwen2-5-7B --verbose > output_omnicap_qwen2-5-7B_MMMU_DEV_VAL.log 2>&1 &
76
+ CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-qwen2-5-32B --verbose > output_omnicap_qwen2-5-32B_MMMU_DEV_VAL.log 2>&1 &
77
+
78
+ CUDA_VISIBLE_DEVICES=0 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-deepseek-distill-7B --verbose > output_omnicap_deepseek_distill_3B_MMMU_DEV_VAL.log 2>&1 &
79
+ CUDA_VISIBLE_DEVICES=0,1 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-deepseek-distill-32B --verbose > output_omnicap_deepseek_distill_32B_MMMU_DEV_VAL.log 2>&1 &
80
+ CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-deepseek-distill-70B --verbose > output_omnicap_deepseek_distill_70B_MMMU_DEV_VAL.log 2>&1 &
81
+
82
+ ```
83
+
84
+
85
+ ## Citation
86
+
87
+ If you find the provided code or models useful for your research, consider citing them as:
88
+ ```
89
+
90
+ ```
91
+
92
+