OmniParser icon_caption — MLX

MLX (bfloat16) conversion of microsoft/OmniParser-v2.0's icon_caption — a Florence-2 fine-tuned on UI elements, for captioning interactive icons in screenshots. Runs on Apple Silicon via mlx_vlm with no PyTorch.

License: MIT (© Microsoft Corporation) — see LICENSE. This repo redistributes the original MIT-licensed weights converted to MLX format.

Usage

Needs a small no-torch patch on transformers 5.x (register florence2_language, route the image processor to CLIPImageProcessorPil). See the conversion recipe at the bottom.

# apply the florence2 no-torch patch first (see recipe), then:
from mlx_vlm import load, generate
model, processor = load("PlusMinus1/omniparser-icon-caption-mlx")
out = generate(model, processor, "<CAPTION>", image=["icon_crop.png"], max_tokens=20)

Provenance

Base: microsoft/Florence-2-base (MIT)
Fine-tune: microsoft/OmniParser-v2.0 icon_caption (MIT)
Conversion: mlx_vlm.convert (bfloat16) + a transformers-5.x no-torch compatibility patch.

Downloads last month: 21

Safetensors

Model size

0.3B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PlusMinus1/omniparser-icon-caption-mlx

Base model

microsoft/OmniParser-v2.0

Finetuned

(15)

this model