BLIP Flickr8k Caption Model

Fine-tuned BLIP image captioning model trained on Flickr8k.

Base model

Salesforce/blip-image-captioning-base

Usage

from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import torch

model_id = "muhammedaydiiinnn/blip-flickr8k-caption"

processor = BlipProcessor.from_pretrained(model_id)
model = BlipForConditionalGeneration.from_pretrained(model_id)

image = Image.open("test.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

with torch.inference_mode():
    output = model.generate(**inputs, max_new_tokens=30)

caption = processor.decode(output[0], skip_special_tokens=True)
print(caption)
Downloads last month
9
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support