Instructions to use nlpconnect/vit-gpt2-image-captioning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nlpconnect/vit-gpt2-image-captioning with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning") model = AutoModelForImageTextToText.from_pretrained("nlpconnect/vit-gpt2-image-captioning") - Notebooks
- Google Colab
- Kaggle
Fine-tune
Please, How can I make fine-tune on custom dataset?
Yes it's definitely possible to fine-tune on (image, text) pairs.
Basically, each item of the dataset should be a pair of (pixel_values, labels), where the labels are the input_ids of the target sequence.
@GehadAbokamar
You can refer to following links:
https://sachinruk.github.io/blog/pytorch/huggingface/2021/12/28/vit-to-gpt2-encoder-decoder-model.html
Thank you for helping^^
I tried to finetune but faced several problems. I believe I need to specify for dataset proper naming and preprocessing, but dont know how:
Please try from this: https://ankur3107.github.io/blogs/the-illustrated-image-captioning-using-transformers/
Hi @ankur310794 I'm looking to finetune this model a custom dataset however these two links you provided are no longer valid. Are there any other resources to assist with fine-tuning this model in PyTorch?