kzawistowsk commited on
Commit
a9c0e2c
·
verified ·
1 Parent(s): 8ef6e5a

added readme

Files changed (1) hide show
  1. README.md +103 -3
README.md CHANGED
@@ -1,3 +1,103 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: microsoft/Florence-2-base
4
+ ---
5
+
6
+ <a id="readme-top"></a>
7
+
8
+ [![arXiv][paper-shield]][paper-url]
9
+ [![MIT License][license-shield]][license-url]
10
+
11
+ <!-- PROJECT LOGO -->
12
+ <br />
13
+ <div align="center">
14
+ <!-- <a href="https://github.com/othneildrew/Best-README-Template">
15
+ <img src="images/logo.png" alt="Logo" width="80" height="80">
16
+ </a> -->
17
+ <h3 align="center">TinyClick: Single-Turn Agent for Empowering GUI Automation</h3>
18
+ <p align="center">
19
+ The code for running the model from paper: TinyClick: Single-Turn Agent for Empowering GUI Automation
20
+ </p>
21
+ </div>
22
+
23
+
24
+ <!-- ABOUT THE PROJECT -->
25
+ ## About The Project
26
+
27
+ We present a single-turn agent for graphical user interface (GUI) interaction tasks, using Vision-Language Model Florence-2-Base. Main goal of the agent is to click on desired UI element based on the screenshot and user command. It demonstrates strong performance on Screenspot and OmniAct, while maintaining a compact size of 0.27B parameters and minimal latency.
28
+
29
+
30
+ <!-- USAGE EXAMPLES -->
31
+ ## Usage
32
+ To set up the environment for running the code, please refer to the [GitHub repository](https://github.com/SamsungLabs/TinyClick). All necessary libraries and dependencies are listed in the requirements.txt file
33
+
34
+ ```python
35
+ from transformers import AutoModelForCausalLM, AutoProcessor
36
+ from PIL import Image
37
+ import requests
38
+ import torch
39
+
40
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
41
+ processor = AutoProcessor.from_pretrained(
42
+ "Krystianz/TinyClick", trust_remote_code=True
43
+ )
44
+ model = AutoModelForCausalLM.from_pretrained(
45
+ "Krystianz/TinyClick",
46
+ trust_remote_code=True,
47
+ ).to(device)
48
+
49
+ url = "sample.png"
50
+ img = Image.open(requests.get(url, stream=True).raw)
51
+
52
+ command = "click on accept and continue button"
53
+ image_size = img.size
54
+
55
+ input_text = ("What to do to execute the command? " + command.strip()).lower()
56
+
57
+ inputs = processor(
58
+ images=img,
59
+ text=input_text,
60
+ return_tensors="pt",
61
+ do_resize=True,
62
+ )
63
+
64
+ outputs = model.generate(**inputs)
65
+ generated_texts = processor.batch_decode(outputs, skip_special_tokens=False)
66
+ ```
67
+
68
+ For postprocessing fuction go to our github repository: https://github.com/SamsungLabs/TinyClick
69
+ ```python
70
+ from tinyclick_utils import postprocess
71
+
72
+ result = postprocess(generated_texts[0], image_size)
73
+ ```
74
+
75
+ <!-- CITATION -->
76
+ ## Citation
77
+
78
+ ```
79
+ @misc{pawlowski2024tinyclicksingleturnagentempowering,
80
+ title={TinyClick: Single-Turn Agent for Empowering GUI Automation},
81
+ author={Pawel Pawlowski and Krystian Zawistowski and Wojciech Lapacz and Marcin Skorupa and Adam Wiacek and Sebastien Postansque and Jakub Hoscilowicz},
82
+ year={2024},
83
+ eprint={2410.11871},
84
+ archivePrefix={arXiv},
85
+ primaryClass={cs.HC},
86
+ url={https://arxiv.org/abs/2410.11871},
87
+ }
88
+ ```
89
+
90
+
91
+ <!-- LICENSE -->
92
+ ## License
93
+
94
+ Please check the MIT license that is listed in this repository. See `LICENSE` for more information.
95
+
96
+ <p align="right">(<a href="#readme-top">back to top</a>)</p>
97
+
98
+
99
+ <!-- MARKDOWN LINKS & IMAGES -->
100
+ [paper-shield]: https://img.shields.io/badge/2024-arXiv-red
101
+ [paper-url]: https://arxiv.org/abs/2410.11871
102
+ [license-shield]: https://img.shields.io/badge/License-MIT-yellow.svg
103
+ [license-url]: https://opensource.org/licenses/MIT