Instructions to use vkao8264/blip-yoda-captioning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vkao8264/blip-yoda-captioning with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="vkao8264/blip-yoda-captioning")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("vkao8264/blip-yoda-captioning") model = AutoModelForImageTextToText.from_pretrained("vkao8264/blip-yoda-captioning") - Notebooks
- Google Colab
- Kaggle
Image captioning model finetuned on BLIP-base, responds like how Yoda speaks,
"Sitting in a car, a man is"
Try web app here: https://yodacaptioner.up.railway.app/
Model Details
Model Description
An image-to-text model finetuned on BLIP-base with the transformers package
- Developed by: vkao8264
- Model type: Image-to-text
- Language(s) (NLP): English
- License: bsd-3-clause
- Finetuned from model [optional]: blip-image-captioning-base
Uses
from PIL import Image
from transformers import AutoProcessor, BlipForConditionalGeneration
processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("vkao8264/blip-yoda-captioning")
filepath = "path-to-your-image"
raw_image = Image.open(filepath).convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda")
output_tokens = model.generate(**inputs)
caption = processor.decode(output_tokens[0], skip_special_tokens=True)
print(caption)
Training Details
Training Data
The model was fine-tuned on 30000 image-caption pairs from the COCO captions dataset. Specifically, captions_train2014.
Before training, captions were changed to yoda-style captions using phi3 with few-shot learning
Scripts can be found on https://github.com/vincent8264/yoda_captioning
- Downloads last month
- 13
Model tree for vkao8264/blip-yoda-captioning
Base model
Salesforce/blip-image-captioning-base