Improve: Update model card with abstract, correct license, and add tags/resources

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +31 -5
README.md CHANGED
@@ -1,13 +1,24 @@
1
  ---
2
  base_model: OpenGVLab/InternVL2-4B
3
  library_name: transformers
4
- license: apache-2.0
5
  pipeline_tag: image-text-to-text
 
 
 
 
 
 
 
 
6
  ---
7
 
8
  # OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
9
 
10
- This model is described in the paper [OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis](https://huggingface.co/papers/2412.19723)
 
 
 
11
 
12
  <div align="center">
13
 
@@ -23,8 +34,8 @@ We introduce OS-Genesis, an interaction-driven pipeline that synthesizes high-qu
23
  ## Quick Start
24
  OS-Genesis-8B-WA is a mobile action model finetuned from [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B).
25
 
26
- ### OS-Genesis AC Family Models
27
- In the following table, we provide an overview of the OS-Genesis AC Family Models used for evaluating the AndroidControl Benchmark.
28
 
29
  | Model Name | Base Model | Training Data | HF Link |
30
  | :-------------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :---------------------------------------------------------: |
@@ -150,8 +161,23 @@ print(f'User: {question}
150
  Assistant: {response}')
151
  ```
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
 
154
- ## Citation
155
  If you find this repository helpful, feel free to cite our paper:
156
  ```bibtex
157
  @article{sun2024osgenesis,
 
1
  ---
2
  base_model: OpenGVLab/InternVL2-4B
3
  library_name: transformers
4
+ license: mit
5
  pipeline_tag: image-text-to-text
6
+ tags:
7
+ - vlm
8
+ - gui-agent
9
+ - multimodal
10
+ - android
11
+ - web
12
+ - computer-vision
13
+ - natural-language-processing
14
  ---
15
 
16
  # OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
17
 
18
+ This model is described in the paper [OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis](https://huggingface.co/papers/2412.19723).
19
+
20
+ ## Abstract
21
+ Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions, then retrospectively derive high-quality tasks to enable trajectory-level exploration. A trajectory reward model is then employed to ensure the quality of the generated trajectories. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks. In-depth analysis further validates OS-Genesis's efficiency and its superior data quality and diversity compared to existing synthesis methods. Our codes, data, and checkpoints are available at this https URL .
22
 
23
  <div align="center">
24
 
 
34
  ## Quick Start
35
  OS-Genesis-8B-WA is a mobile action model finetuned from [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B).
36
 
37
+ ### OS-Genesis WebArena Family Models
38
+ In the following table, we provide an overview of the OS-Genesis WA Family Models used for evaluating the WebArena Benchmark.
39
 
40
  | Model Name | Base Model | Training Data | HF Link |
41
  | :-------------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :---------------------------------------------------------: |
 
161
  Assistant: {response}')
162
  ```
163
 
164
+ ## More Resources
165
+ ### Raw collected triples
166
+
167
+ In addition to our complete trajectory data on HuggingFace, we also provide collected raw `<s_pre, a, s_post>` triples. You can use them to reproduce the process of reverse task synthesis directly, without re-collecting them from emulators yourself ๐Ÿ˜„. The screenshots and corresponding texts (with SoM info contained) are provided below:
168
+
169
+ | Data Type | Screenshots | Data JSON |
170
+ | :-------------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: |
171
+ | Mobile | [Screenshots](https://drive.google.com/file/d/1ILyz_-DDOdAk32kue1lEPaV50YzQ5c4v/view?usp=sharing) | [Data JSON](https://drive.google.com/file/d/1dSxNf-co4LGh93NoiUgWKdbcf8Mo_VWG/view?usp=sharing) |
172
+ | Web | [Screenshots](https://drive.google.com/file/d/1X2QktZ51OUofZ43vDGB4RuAPlXbdf5ua/view?usp=sharing) | [Data JSON](https://drive.google.com/file/d/1mDxhonGnd3wZbNQgWMVpYEkPW26_FVg8/view?usp=sharing) |
173
+
174
+ Feel free to email me if you require additional data of this kind.
175
+
176
+ ## FAQ โ“
177
+
178
+ We have collected some questions from emails, Hugging Face, and WeChat communications. Please check the [FAQ](https://github.com/OS-Copilot/OS-Genesis/blob/main/faq.md) ๐Ÿค–
179
 
180
+ ## Citation ๐Ÿ“–
181
  If you find this repository helpful, feel free to cite our paper:
182
  ```bibtex
183
  @article{sun2024osgenesis,