Florence-2 Icon Captioning
minhvn4/florence2-icon is a fine-tuned version of Microsoft's Florence-2-base specifically tailored for Icon Captioning. This model understands and generates descriptive captions for UI icons, symbols, and pictograms.
Because it relies on custom code from the original Florence-2 implementation, you must use trust_remote_code=True when loading the model.
Model Details
- Architecture: Florence-2 (
AutoModelForCausalLM) - Base Model: microsoft/Florence-2-base
- Task: Image to Text (Icon Captioning)
- License: MIT
- Format: Safetensors
Usage
Here's how to load and use the model for icon captioning in your Python code. Make sure to install the required dependencies (transformers, torch, Pillow, einops, timm).
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM
# Set model ID
model_id = "minhvn4/florence2-icon"
# Load the processor and model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True).eval()
# Move model to target device
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
def generate_caption(image_path, prompt="<CAPTION>"):
image = Image.open(image_path).convert("RGB")
# Process inputs
inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)
# Generate text
with torch.inference_mode():
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=20,
num_beams=1,
do_sample=False
)
# Decode output
caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
return caption.strip()
# Run inference on an icon
image_path = "path/to/your/icon.png"
caption = generate_caption(image_path)
print(f"Generated caption: {caption}")
Intended Use
This model is intended to be used for generating descriptive text for single UI icons. It can be integrated into UI parsing tools (like OmniParser), accessibility tools, or web/mobile development workflows to automatically provide clear text descriptions for graphical elements.
Troubleshooting
If you encounter an error like ValueError: The model class you are passing is not supported, ensure you are passing trust_remote_code=True to both the AutoProcessor and the AutoModelForCausalLM. You may also need to install einops and timm which are required by the Florence-2 architecture.
- Downloads last month
- 28
Model tree for minhvn4/florence2-icon
Base model
microsoft/Florence-2-base