BLIP Flickr8k Caption Model
Fine-tuned BLIP image captioning model trained on Flickr8k.
Base model
Salesforce/blip-image-captioning-base
Usage
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import torch
model_id = "muhammedaydiiinnn/blip-flickr8k-caption"
processor = BlipProcessor.from_pretrained(model_id)
model = BlipForConditionalGeneration.from_pretrained(model_id)
image = Image.open("test.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.inference_mode():
output = model.generate(**inputs, max_new_tokens=30)
caption = processor.decode(output[0], skip_special_tokens=True)
print(caption)
- Downloads last month
- 9