OmniParser v2 ONNX

ONNX conversion of microsoft/OmniParser-v2.0 for fast CPU/GPU inference.

Models Included

Model	Description	Size
`detector.onnx`	YOLO-based UI element detector	80 MB
`caption.onnx` + `caption.onnx.data`	Icon/element captioning model	350 MB
`florence2_onnx/`	Florence-2 vision-language model (encoder, decoder, vision)	1.1 GB
`paddleocr_onnx/`	PaddleOCR text detection and recognition	10 MB

Usage with ONNX Runtime

import onnxruntime as ort
import numpy as np
from PIL import Image

# Load detector
session = ort.InferenceSession("detector.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

# Preprocess image (640x640, normalized)
img = Image.open("screenshot.png").resize((640, 640))
input_array = np.array(img).transpose(2, 0, 1).astype(np.float32) / 255.0
input_array = np.expand_dims(input_array, 0)

# Run inference
outputs = session.run(None, {"images": input_array})