Spaces:
Sleeping
Sleeping
| title: BLIP Captioner | |
| emoji: 🖼 | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.9.1 | |
| python_version: "3.11" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| tags: | |
| - image-captioning | |
| - vision-language | |
| - blip | |
| - multimodal | |
| - salesforce | |
| short_description: Generate captions for images with BLIP | |
| # BLIP Image Captioner | |
| Generate natural-language descriptions for any image using Salesforce's | |
| **BLIP** (Bootstrapping Language-Image Pre-training) model. | |
| ## Features | |
| - **Single caption mode** — standard captioning with tunable beam width | |
| - **Conditional captioning** — optional prompt prefix (e.g., "a painting of") | |
| - **Variety comparison** — generate 3 captions with different beam widths | |
| to see how output changes | |
| ## Model | |
| - **Name:** [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base) | |
| - **Paper:** [BLIP](https://arxiv.org/abs/2201.12086) (Li et al., 2022) | |
| - **Parameters:** ~250M | |
| - **Architecture:** ViT-base + BERT-base with cross-attention | |
| ## Performance | |
| - **First load:** ~20 seconds (model download + init) | |
| - **Cached inference:** 2-8 seconds per caption (CPU, depends on beam width) | |
| ## License | |
| MIT for this deployment code. Model is released by Salesforce under BSD-3. | |