finetuning script

#1
by mindadeepam - opened

Please add a finetuning script for docvqa.

Seconding this request - does anyone have a notebook?

Hi there!
There is a fine-tuning notebook here that uses the base model :https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_pix2struct.ipynb let us know if you face into any issue

This looks really useful! I'll try playing around with it and maybe share a notebook with this specific vqa model if I get it up and running. Seems like it is almost exactly the same with this model except in __getitem__ the processor would take the question parameter as well:

encoding = self.processor(images=item["image"],  text=item['question'], return_tensors="pt",  add_special_tokens=True,  max_patches=MAX_PATCHES)

Hey @ybelkada and @rteresiOB Thanks a lot. I'll try it out.

@rtersiOB, yes that seems to be it! Can't wait to see what you will build on top of that!
@mindadeepam , let us know if you face into any issue!

mindadeepam changed discussion status to closed

Sign up or log in to comment