OmniParser v2 ONNX

ONNX conversion of microsoft/OmniParser-v2.0 for fast CPU/GPU inference.

Models Included

Model Description Size
detector.onnx YOLO-based UI element detector 80 MB
caption.onnx + caption.onnx.data Icon/element captioning model 350 MB
florence2_onnx/ Florence-2 vision-language model (encoder, decoder, vision) 1.1 GB
paddleocr_onnx/ PaddleOCR text detection and recognition 10 MB

Usage with ONNX Runtime

import onnxruntime as ort
import numpy as np
from PIL import Image

# Load detector
session = ort.InferenceSession("detector.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

# Preprocess image (640x640, normalized)
img = Image.open("screenshot.png").resize((640, 640))
input_array = np.array(img).transpose(2, 0, 1).astype(np.float32) / 255.0
input_array = np.expand_dims(input_array, 0)

# Run inference
outputs = session.run(None, {"images": input_array})

Conversion

Converted to ONNX by @maxiboch.

See discussion: microsoft/OmniParser-v2.0#5

Original Model

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maxiboch/OmniParser-v2-onnx

Quantized
(1)
this model

Paper for maxiboch/OmniParser-v2-onnx