blip-captioner / README.md
WolfDavid's picture
Initial deploy: BLIP image captioning
a388160

A newer version of the Gradio SDK is available: 6.15.2

Upgrade
metadata
title: BLIP Captioner
emoji: 🖼
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
tags:
  - image-captioning
  - vision-language
  - blip
  - multimodal
  - salesforce
short_description: Generate captions for images with BLIP

BLIP Image Captioner

Generate natural-language descriptions for any image using Salesforce's BLIP (Bootstrapping Language-Image Pre-training) model.

Features

  • Single caption mode — standard captioning with tunable beam width
  • Conditional captioning — optional prompt prefix (e.g., "a painting of")
  • Variety comparison — generate 3 captions with different beam widths to see how output changes

Model

Performance

  • First load: ~20 seconds (model download + init)
  • Cached inference: 2-8 seconds per caption (CPU, depends on beam width)

License

MIT for this deployment code. Model is released by Salesforce under BSD-3.