Spaces:

WolfDavid
/

blip-captioner

Sleeping

Initial deploy: BLIP image captioning

a388160 about 2 months ago

1.27 kB

	---
	title: BLIP Captioner
	emoji: 🖼
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 5.9.1
	python_version: "3.11"
	app_file: app.py
	pinned: false
	license: mit
	tags:
	- image-captioning
	- vision-language
	- blip
	- multimodal
	- salesforce
	short_description: Generate captions for images with BLIP
	---

	# BLIP Image Captioner

	Generate natural-language descriptions for any image using Salesforce's
	BLIP (Bootstrapping Language-Image Pre-training) model.

	## Features

	- Single caption mode — standard captioning with tunable beam width
	- Conditional captioning — optional prompt prefix (e.g., "a painting of")
	- Variety comparison — generate 3 captions with different beam widths
	to see how output changes

	## Model

	- Name: [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
	- Paper: [BLIP](https://arxiv.org/abs/2201.12086) (Li et al., 2022)
	- Parameters: ~250M
	- Architecture: ViT-base + BERT-base with cross-attention

	## Performance

	- First load: ~20 seconds (model download + init)
	- Cached inference: 2-8 seconds per caption (CPU, depends on beam width)

	## License

	MIT for this deployment code. Model is released by Salesforce under BSD-3.