Fine-tune

by GehadAbokamar - opened Jul 24, 2022

Discussion

GehadAbokamar

Jul 24, 2022

Please, How can I make fine-tune on custom dataset?

nielsr

Jul 24, 2022

Yes it's definitely possible to fine-tune on (image, text) pairs.

Basically, each item of the dataset should be a pair of (pixel_values, labels), where the labels are the input_ids of the target sequence.

ankur310794

NLP Connect org Jul 25, 2022

@GehadAbokamar
You can refer to following links:
https://sachinruk.github.io/blog/pytorch/huggingface/2021/12/28/vit-to-gpt2-encoder-decoder-model.html

https://sachinruk.github.io/blog/pytorch/huggingface/2022/01/26/visionencoderdecoder-model-training.html

https://github.com/Jessinra/Image-Captioning-Lab-2/blob/master/v2.3.1%20GPT2%20find%20best%20captions.ipynb

GehadAbokamar

Jul 27, 2022

Thank you for helping^^

ankur310794 changed discussion status to closed Jul 27, 2022

InsafQ

Mar 12, 2023

I tried to finetune but faced several problems. I believe I need to specify for dataset proper naming and preprocessing, but dont know how:

https://stackoverflow.com/questions/75713161/finetuning-vision-encoder-decoder-models-with-huggingface-causes-valueerror-exp?noredirect=1#comment133566069_75713161

ankur310794

NLP Connect org Mar 12, 2023

Please try from this: https://ankur3107.github.io/blogs/the-illustrated-image-captioning-using-transformers/

JAFruin

Aug 15, 2023

Hi @ankur310794 I'm looking to finetune this model a custom dataset however these two links you provided are no longer valid. Are there any other resources to assist with fine-tuning this model in PyTorch?

https://sachinruk.github.io/blog/pytorch/huggingface/2021/12/28/vit-to-gpt2-encoder-decoder-model.html

https://sachinruk.github.io/blog/pytorch/huggingface/2022/01/26/visionencoderdecoder-model-training.html

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment