Inquiry regarding coordinate grounding issues in gui owl 1.5-32b

#4
by boanboa - opened

Dear GUI Owl Team,

​I am a developer working on mobile GUI automation. I have previously been a heavy user of the first GUI Owl model and found it incredibly effective for my workflows.

​I am currently encountering a persistent issue regarding coordinate grounding accuracy when following the official implementation.

​Context & Issue:
​Environment: Android (Device Resolution: 1080 \times 2400)
​Reference: I am strictly following the methodology used in the Mobile-Agent-v3.5 Cookbook.
​The Problem: While the model provides relative coordinates as expected, converting these back to my screen’s absolute pixels results in significant misalignment. The targets are consistently "off" by a margin that prevents successful automation.
​Questions:
​Are there specific preprocessing steps (e.g., padding, aspect ratio preservation, or specific resizing methods) required for version 1.5 that differ from version 1.0?
​Does the model assume a specific normalized coordinate system (e.g., 0-1000 range) that might require a different scaling factor for 20:9 aspect ratio screens like mine (1080 \times 2400)?
​Are there any known issues or "gotchas" regarding the Android system bars (status/navigation bar) affecting the visual grounding in this version?
​I would greatly appreciate any guidance or technical documentation that could help resolve this coordinate mismatch. Thank you for your hard work on this impressive model.

Sign up or log in to comment