Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.15.2
metadata
title: BLIP Captioner
emoji: 🖼
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
tags:
- image-captioning
- vision-language
- blip
- multimodal
- salesforce
short_description: Generate captions for images with BLIP
BLIP Image Captioner
Generate natural-language descriptions for any image using Salesforce's BLIP (Bootstrapping Language-Image Pre-training) model.
Features
- Single caption mode — standard captioning with tunable beam width
- Conditional captioning — optional prompt prefix (e.g., "a painting of")
- Variety comparison — generate 3 captions with different beam widths to see how output changes
Model
- Name: Salesforce/blip-image-captioning-base
- Paper: BLIP (Li et al., 2022)
- Parameters: ~250M
- Architecture: ViT-base + BERT-base with cross-attention
Performance
- First load: ~20 seconds (model download + init)
- Cached inference: 2-8 seconds per caption (CPU, depends on beam width)
License
MIT for this deployment code. Model is released by Salesforce under BSD-3.