OmniParser icon_caption โ€” MLX

MLX (bfloat16) conversion of microsoft/OmniParser-v2.0's icon_caption โ€” a Florence-2 fine-tuned on UI elements, for captioning interactive icons in screenshots. Runs on Apple Silicon via mlx_vlm with no PyTorch.

License: MIT (ยฉ Microsoft Corporation) โ€” see LICENSE. This repo redistributes the original MIT-licensed weights converted to MLX format.

Usage

Needs a small no-torch patch on transformers 5.x (register florence2_language, route the image processor to CLIPImageProcessorPil). See the conversion recipe at the bottom.

# apply the florence2 no-torch patch first (see recipe), then:
from mlx_vlm import load, generate
model, processor = load("PlusMinus1/omniparser-icon-caption-mlx")
out = generate(model, processor, "<CAPTION>", image=["icon_crop.png"], max_tokens=20)

Provenance

  • Base: microsoft/Florence-2-base (MIT)
  • Fine-tune: microsoft/OmniParser-v2.0 icon_caption (MIT)
  • Conversion: mlx_vlm.convert (bfloat16) + a transformers-5.x no-torch compatibility patch.
Downloads last month
21
Safetensors
Model size
0.3B params
Tensor type
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for PlusMinus1/omniparser-icon-caption-mlx

Finetuned
(15)
this model