Inquiry regarding coordinate grounding issues in gui owl 1.5-32b
Dear GUI Owl Team,
I am a developer working on mobile GUI automation. I have previously been a heavy user of the first GUI Owl model and found it incredibly effective for my workflows.
I am currently encountering a persistent issue regarding coordinate grounding accuracy when following the official implementation.
Context & Issue:
Environment: Android (Device Resolution: 1080 \times 2400)
Reference: I am strictly following the methodology used in the Mobile-Agent-v3.5 Cookbook.
The Problem: While the model provides relative coordinates as expected, converting these back to my screen’s absolute pixels results in significant misalignment. The targets are consistently "off" by a margin that prevents successful automation.
Questions:
Are there specific preprocessing steps (e.g., padding, aspect ratio preservation, or specific resizing methods) required for version 1.5 that differ from version 1.0?
Does the model assume a specific normalized coordinate system (e.g., 0-1000 range) that might require a different scaling factor for 20:9 aspect ratio screens like mine (1080 \times 2400)?
Are there any known issues or "gotchas" regarding the Android system bars (status/navigation bar) affecting the visual grounding in this version?
I would greatly appreciate any guidance or technical documentation that could help resolve this coordinate mismatch. Thank you for your hard work on this impressive model.