kzawistowsk
/

TinyClick

Model card Files Files and versions

xet

Community

kzawistowsk commited on Aug 5, 2025

Commit

a9c0e2c

verified ·

1 Parent(s): 8ef6e5a

Readme

Browse files

added readme

Files changed (1) hide show

README.md +103 -3

README.md CHANGED Viewed

@@ -1,3 +1,103 @@
----
-license: mit
----

+---
+license: mit
+base_model: microsoft/Florence-2-base
+---
+<a id="readme-top"></a>
+[![arXiv][paper-shield]][paper-url]
+[![MIT License][license-shield]][license-url]
+<!-- PROJECT LOGO -->
+<br />
+<div align="center">
+  <!-- <a href="https://github.com/othneildrew/Best-README-Template">
+    <img src="images/logo.png" alt="Logo" width="80" height="80">
+  </a> -->
+  <h3 align="center">TinyClick: Single-Turn Agent for Empowering GUI Automation</h3>
+  <p align="center">
+    The code for running the model from paper: TinyClick: Single-Turn Agent for Empowering GUI Automation
+  </p>
+</div>
+<!-- ABOUT THE PROJECT -->
+## About The Project
+We present a single-turn agent for graphical user interface (GUI) interaction tasks, using Vision-Language Model Florence-2-Base. Main goal of the agent is to click on desired UI element based on the screenshot and user command. It demonstrates strong performance on Screenspot and OmniAct, while maintaining a compact size of 0.27B parameters and minimal latency.
+<!-- USAGE EXAMPLES -->
+## Usage
+To set up the environment for running the code, please refer to the [GitHub repository](https://github.com/SamsungLabs/TinyClick). All necessary libraries and dependencies are listed in the requirements.txt file
+```python
+from transformers import AutoModelForCausalLM, AutoProcessor
+from PIL import Image
+import requests
+import torch
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+processor = AutoProcessor.from_pretrained(
+    "Krystianz/TinyClick", trust_remote_code=True
+)
+model = AutoModelForCausalLM.from_pretrained(
+    "Krystianz/TinyClick",
+    trust_remote_code=True,
+).to(device)
+url = "sample.png"
+img = Image.open(requests.get(url, stream=True).raw)
+command = "click on accept and continue button"
+image_size = img.size
+input_text = ("What to do to execute the command? " + command.strip()).lower()
+inputs = processor(
+    images=img,
+    text=input_text,
+    return_tensors="pt",
+    do_resize=True,
+)
+outputs = model.generate(**inputs)
+generated_texts = processor.batch_decode(outputs, skip_special_tokens=False)
+```
+For postprocessing fuction go to our github repository: https://github.com/SamsungLabs/TinyClick
+```python
+from tinyclick_utils import postprocess
+result = postprocess(generated_texts[0], image_size)
+```
+<!-- CITATION -->
+## Citation
+```
+@misc{pawlowski2024tinyclicksingleturnagentempowering,
+    title={TinyClick: Single-Turn Agent for Empowering GUI Automation},
+    author={Pawel Pawlowski and Krystian Zawistowski and Wojciech Lapacz and Marcin Skorupa and Adam Wiacek and Sebastien Postansque and Jakub Hoscilowicz},
+    year={2024},
+    eprint={2410.11871},
+    archivePrefix={arXiv},
+    primaryClass={cs.HC},
+    url={https://arxiv.org/abs/2410.11871},
+}
+```
+<!-- LICENSE -->
+## License
+Please check the MIT license that is listed in this repository. See `LICENSE` for more information.
+<p align="right">(<a href="#readme-top">back to top</a>)</p>
+<!-- MARKDOWN LINKS & IMAGES -->
+[paper-shield]: https://img.shields.io/badge/2024-arXiv-red
+[paper-url]: https://arxiv.org/abs/2410.11871
+[license-shield]: https://img.shields.io/badge/License-MIT-yellow.svg
+[license-url]: https://opensource.org/licenses/MIT