macpaw-research
/

blip-icon-captioning

image-text-to-text

icon-description

image-captioning

🇪🇺 Region: EU

Model card Files Files and versions

blip-icon-captioning / README.md

hellcaster's picture

Update README.md

826e5dc verified 3 months ago

|

history blame contribute delete

2.57 kB

	---
	language:
	- en
	base_model:
	- Salesforce/blip-image-captioning-base
	pipeline_tag: image-to-text
	tags:
	- blip
	- icon-description
	- image-captioning
	license: mit
	library_name: transformers
	---
	# 🧠 BLIP — UI Elements Captioning

	This model is a fine-tuned version of [`Salesforce/blip-image-captioning-base`](https://huggingface.co/Salesforce/blip-image-captioning-base), adapted for captioning UI elements from macOS application screenshots.

	It is part of the Screen2AX research project focused on improving accessibility using vision-based deep learning.

	---

	## 🎯 Use Case

	The model takes an image of a UI icon or element and generates a natural language description (e.g., `"Settings icon"`, `"Play button"`, `"Search field"`).

	This helps build assistive technologies such as screen readers by providing textual labels for unlabeled visual components.

	---

	## 🏗 Model Architecture

	- Base model: [`Salesforce/blip-image-captioning-base`](https://huggingface.co/Salesforce/blip-image-captioning-base)
	- Architecture: BLIP (Bootstrapping Language-Image Pre-training)
	- Task: `image-to-text`

	---

	## 🖼 Example

	```python
	from transformers import BlipProcessor, BlipForConditionalGeneration
	from PIL import Image
	import requests

	processor = BlipProcessor.from_pretrained("macpaw-research/blip-icon-captioning")
	model = BlipForConditionalGeneration.from_pretrained("macpaw-research/blip-icon-captioning")

	image = Image.open("path/to/ui_icon.png")
	inputs = processor(images=image, return_tensors="pt")
	output = model.generate(**inputs)
	caption = processor.decode(output[0], skip_special_tokens=True)

	print(caption)
	# Example: "Settings icon"
	```

	---

	## 📜 License

	This model is released under the MIT License.

	---

	## 🔗 Related Projects

	- [Screen2AX Project](https://github.com/MacPaw/Screen2AX)
	- [Screen2AX HuggingFace Collection](https://huggingface.co/collections/macpaw-research/screen2ax)

	---

	## ✍️ Citation

	If you use this model in your research, please cite the Screen2AX paper:

	```bibtex
	@misc{muryn2025screen2axvisionbasedapproachautomatic,
	title={Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation},
	author={Viktor Muryn and Marta Sumyk and Mariya Hirna and Sofiya Garkot and Maksym Shamrai},
	year={2025},
	eprint={2507.16704},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2507.16704},
	}
	```

	---

	## 🌐 MacPaw Research

	Learn more at [https://research.macpaw.com](https://research.macpaw.com)