Detect UI elements in images
OmniParser, turn your LLM into GUI agent
Find clickable coordinates on a screenshot