Generate HTML from an image with prompts
Locate GUI elements using instructions
Localize a click on a UI image based on your instruction
Find click coordinates on images based on instructions