finetuning script
Please add a finetuning script for docvqa.
Seconding this request - does anyone have a notebook?
Hi there!
There is a fine-tuning notebook here that uses the base model :https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_pix2struct.ipynb let us know if you face into any issue
This looks really useful! I'll try playing around with it and maybe share a notebook with this specific vqa model if I get it up and running. Seems like it is almost exactly the same with this model except in __getitem__ the processor would take the question parameter as well:
encoding = self.processor(images=item["image"], text=item['question'], return_tensors="pt", add_special_tokens=True, max_patches=MAX_PATCHES)
Hey @ybelkada and @rteresiOB Thanks a lot. I'll try it out.
@rtersiOB, yes that seems to be it! Can't wait to see what you will build on top of that!
@mindadeepam
, let us know if you face into any issue!