blip-captioner / README.md
WolfDavid's picture
Initial deploy: BLIP image captioning
a388160
---
title: BLIP Captioner
emoji: 🖼
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
python_version: "3.11"
app_file: app.py
pinned: false
license: mit
tags:
- image-captioning
- vision-language
- blip
- multimodal
- salesforce
short_description: Generate captions for images with BLIP
---
# BLIP Image Captioner
Generate natural-language descriptions for any image using Salesforce's
**BLIP** (Bootstrapping Language-Image Pre-training) model.
## Features
- **Single caption mode** — standard captioning with tunable beam width
- **Conditional captioning** — optional prompt prefix (e.g., "a painting of")
- **Variety comparison** — generate 3 captions with different beam widths
to see how output changes
## Model
- **Name:** [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
- **Paper:** [BLIP](https://arxiv.org/abs/2201.12086) (Li et al., 2022)
- **Parameters:** ~250M
- **Architecture:** ViT-base + BERT-base with cross-attention
## Performance
- **First load:** ~20 seconds (model download + init)
- **Cached inference:** 2-8 seconds per caption (CPU, depends on beam width)
## License
MIT for this deployment code. Model is released by Salesforce under BSD-3.