| --- | |
| license: apache-2.0 | |
| pipeline_tag: image-text-to-text | |
| tags: | |
| - grounding | |
| - agent | |
| GUI_Spotlight is a `think-with-image` GUI visual grounding model. For each step, it first calls tooling to crop the image according to its own predictions, and then returns an exact coordinate location. | |
| For evaluation and inference details, please refer to [the GUI_Spotlight repository](https://github.com/bin123apple/GUI_Spotlight) | |