PaDT-MLLM commited on
Commit
48cb96b
Β·
verified Β·
1 Parent(s): e1d584f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -2
README.md CHANGED
@@ -85,7 +85,7 @@ from PaDT import PaDTForConditionalGeneration, VisonTextProcessingClass, parseVR
85
 
86
 
87
  TEST_IMG_PATH="./eval/imgs/000000368335.jpg"
88
- MODEL_PATH="PaDT-MLLM/PaDT_Pro_3B"
89
 
90
  # load model
91
  model = PaDTForConditionalGeneration.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, device_map={"": 0})
@@ -97,7 +97,7 @@ processor = VisonTextProcessingClass(processor, model.config.vision_config.spati
97
  processor.prepare(model.model.embed_tokens.weight.shape[0])
98
 
99
  # question prompt
100
- PROMPT = "Please describe this image."
101
 
102
  # construct conversation
103
  message = [
@@ -187,6 +187,62 @@ Here are some randomly selected test examples showcasing PaDT’s excellent perf
187
  <img src="./assets/TAM.webp" width="900"/>
188
  </div>
189
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  ## License Agreement
191
 
192
  PaDT is licensed under Apache 2.0.
 
85
 
86
 
87
  TEST_IMG_PATH="./eval/imgs/000000368335.jpg"
88
+ MODEL_PATH="PaDT-MLLM/PaDT_REC_7B"
89
 
90
  # load model
91
  model = PaDTForConditionalGeneration.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, device_map={"": 0})
 
97
  processor.prepare(model.model.embed_tokens.weight.shape[0])
98
 
99
  # question prompt
100
+ PROMPT = """Please carefully check the image and detect the object this sentence describes: "The car is on the left side of the horse"."""
101
 
102
  # construct conversation
103
  message = [
 
187
  <img src="./assets/TAM.webp" width="900"/>
188
  </div>
189
 
190
+ ## Training Instruction
191
+
192
+ Download Datasets:
193
+
194
+ - [COCO](https://cocodataset.org/#home)
195
+
196
+ - RefCOCO/+/g
197
+ ```bash
198
+ wget https://web.archive.org/web/20220413011718/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip
199
+ wget https://web.archive.org/web/20220413011656/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip
200
+ wget https://web.archive.org/web/20220413012904/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip
201
+ ```
202
+ Unpack these datasets and place them under the following directory:
203
+
204
+ ```
205
+ PaDT/
206
+ β”œβ”€β”€ dataset/
207
+ β”‚ β”œβ”€β”€ coco/
208
+ β”‚ β”‚ β”œβ”€β”€ annotations/
209
+ β”‚ β”‚ β”œβ”€β”€ train2014/
210
+ β”‚ β”‚ β”œβ”€β”€ train2017/
211
+ β”‚ β”‚ β”œβ”€β”€ val2014/
212
+ β”‚ β”‚ └── val2017/
213
+ β”‚ └── RefCOCO/
214
+ β”‚ β”œβ”€β”€ refcoco/
215
+ β”‚ β”œβ”€β”€ refcoco+/
216
+ β”‚ └── refcocog/
217
+ ```
218
+
219
+ Preprocess the datasets:
220
+ - 1. Preprocess via our scripts. (Please first update the dataset path configuration in the preprocessing scripts)
221
+ ```bash
222
+ cd src/preprocess
223
+ python process_coco.py
224
+ python process_refcoco.py
225
+ ```
226
+ - 2. We also released the preprocessed datasets which are ready to use for training in huggingface.
227
+ | Dataset | Dataset Path | Task Type |
228
+ | - | - | -|
229
+ | COCO | [PaDT-MLLM/COCO](https://huggingface.co/datasets/PaDT-MLLM/COCO) | Open Vocabulary Detection |
230
+ | RefCOCO | [PaDT-MLLM/RefCOCO](https://huggingface.co/datasets/PaDT-MLLM/RefCOCO) | Referring Expression Comprehension/Segmentation |
231
+ | RIC | [PaDT-MLLM/ReferringImageCaptioning](https://huggingface.co/datasets/PaDT-MLLM/ReferringImageCaptioning) | Referring Image Captioning |
232
+
233
+ The training scripts in `run_scripts` are ready to execute.
234
+
235
+ For example: Train the PaDT-Pro 3B model on a single node with 8Γ—96 GB GPUs.
236
+
237
+ ```bash
238
+ bash ./run_scripts/padt_pro_3b_sft.sh
239
+ ```
240
+
241
+ ## Evaluation
242
+
243
+ We provide a simple inference example in `eval/test_demo.py`. More evaluation scripts will be added soon.
244
+
245
+
246
  ## License Agreement
247
 
248
  PaDT is licensed under Apache 2.0.