Spaces:

WolfDavid
/

blip-captioner

Sleeping

App Files Files Community

blip-captioner / README.md

WolfDavid

Initial deploy: BLIP image captioning

a388160 about 2 months ago

preview code

raw

history blame contribute delete

1.27 kB

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

metadata

title: BLIP Captioner
emoji: 🖼
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
tags:
  - image-captioning
  - vision-language
  - blip
  - multimodal
  - salesforce
short_description: Generate captions for images with BLIP

BLIP Image Captioner

Generate natural-language descriptions for any image using Salesforce's BLIP (Bootstrapping Language-Image Pre-training) model.

Features

Single caption mode — standard captioning with tunable beam width
Conditional captioning — optional prompt prefix (e.g., "a painting of")
Variety comparison — generate 3 captions with different beam widths to see how output changes

Model

Name: Salesforce/blip-image-captioning-base
Paper: BLIP (Li et al., 2022)
Parameters: ~250M
Architecture: ViT-base + BERT-base with cross-attention

Performance

First load: ~20 seconds (model download + init)
Cached inference: 2-8 seconds per caption (CPU, depends on beam width)

License

MIT for this deployment code. Model is released by Salesforce under BSD-3.