finetuning script

by mindadeepam - opened Apr 12, 2023

Discussion

mindadeepam

Apr 12, 2023

Please add a finetuning script for docvqa.

rteresiOB

Apr 20, 2023

Seconding this request - does anyone have a notebook?

ybelkada

Apr 20, 2023

Hi there!
There is a fine-tuning notebook here that uses the base model :https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_pix2struct.ipynb let us know if you face into any issue

rteresiOB

Apr 20, 2023

•

edited Apr 20, 2023

This looks really useful! I'll try playing around with it and maybe share a notebook with this specific vqa model if I get it up and running. Seems like it is almost exactly the same with this model except in __getitem__ the processor would take the question parameter as well:

encoding = self.processor(images=item["image"],  text=item['question'], return_tensors="pt",  add_special_tokens=True,  max_patches=MAX_PATCHES)

mindadeepam

Apr 20, 2023

Hey @ybelkada and @rteresiOB Thanks a lot. I'll try it out.

ybelkada

Apr 22, 2023

@rtersiOB, yes that seems to be it! Can't wait to see what you will build on top of that!
@mindadeepam , let us know if you face into any issue!

mindadeepam changed discussion status to closed Apr 28, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment