hellcaster's picture
Update README.md
826e5dc verified
---
language:
- en
base_model:
- Salesforce/blip-image-captioning-base
pipeline_tag: image-to-text
tags:
- blip
- icon-description
- image-captioning
license: mit
library_name: transformers
---
# 🧠 BLIP β€” UI Elements Captioning
This model is a fine-tuned version of [`Salesforce/blip-image-captioning-base`](https://huggingface.co/Salesforce/blip-image-captioning-base), adapted for **captioning UI elements** from macOS application screenshots.
It is part of the **Screen2AX** research project focused on improving accessibility using vision-based deep learning.
---
## 🎯 Use Case
The model takes an image of a **UI icon or element** and generates a **natural language description** (e.g., `"Settings icon"`, `"Play button"`, `"Search field"`).
This helps build assistive technologies such as screen readers by providing textual labels for unlabeled visual components.
---
## πŸ— Model Architecture
- Base model: [`Salesforce/blip-image-captioning-base`](https://huggingface.co/Salesforce/blip-image-captioning-base)
- Architecture: **BLIP** (Bootstrapping Language-Image Pre-training)
- Task: `image-to-text`
---
## πŸ–Ό Example
```python
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import requests
processor = BlipProcessor.from_pretrained("macpaw-research/blip-icon-captioning")
model = BlipForConditionalGeneration.from_pretrained("macpaw-research/blip-icon-captioning")
image = Image.open("path/to/ui_icon.png")
inputs = processor(images=image, return_tensors="pt")
output = model.generate(**inputs)
caption = processor.decode(output[0], skip_special_tokens=True)
print(caption)
# Example: "Settings icon"
```
---
## πŸ“œ License
This model is released under the **MIT License**.
---
## πŸ”— Related Projects
- [Screen2AX Project](https://github.com/MacPaw/Screen2AX)
- [Screen2AX HuggingFace Collection](https://huggingface.co/collections/macpaw-research/screen2ax)
---
## ✍️ Citation
If you use this model in your research, please cite the Screen2AX paper:
```bibtex
@misc{muryn2025screen2axvisionbasedapproachautomatic,
title={Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation},
author={Viktor Muryn and Marta Sumyk and Mariya Hirna and Sofiya Garkot and Maksym Shamrai},
year={2025},
eprint={2507.16704},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2507.16704},
}
```
---
## 🌐 MacPaw Research
Learn more at [https://research.macpaw.com](https://research.macpaw.com)