Instructions to use Salesforce/blip-image-captioning-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Salesforce/blip-image-captioning-large with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="Salesforce/blip-image-captioning-large")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-large") model = AutoModelForImageTextToText.from_pretrained("Salesforce/blip-image-captioning-large") - Notebooks
- Google Colab
- Kaggle
Is finetuning supported for this model? If so can people give me some pointers?
Is finetuning supported for this model? If so can people give me some pointers?
I found this post: https://discuss.huggingface.co/t/finetune-blip-on-customer-dataset-20893/28446
but this is about Salesforce/blip-vqa-base instead of this model Salesforce/blip-image-captioning-large
Hi @husjerry
Fine-tuning the VQA should be done in the same way than the image captioning fine-tuning, I think the only difference is on the way you prompt the model (but I am not sure).
You can have a look at the instructions shared on that thread or refer to the original vqa fine-tuning script:; https://github.com/salesforce/BLIP/blob/main/train_vqa.py and try to use the HF model or use their model, then convert it to HF version using the conversion script here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/blip/convert_blip_original_pytorch_to_hf.py