Update README.md
Browse files
README.md
CHANGED
|
@@ -18,39 +18,33 @@ UGround is a strong GUI visual grounding model trained with a simple recipe. Che
|
|
| 18 |
|
| 19 |
## Models
|
| 20 |
|
| 21 |
-
-
|
| 22 |
-
- UGround
|
| 23 |
-
- UGround-V1-
|
| 24 |
-
- UGround-V1-
|
|
|
|
|
|
|
| 25 |
|
| 26 |
## Release Plan
|
| 27 |
|
| 28 |
-
- [x] Model Weights
|
| 29 |
-
- [x] Initial
|
| 30 |
-
- [x] Qwen2-VL-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
- [ ]
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
- [x]
|
| 40 |
-
- [x]
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
- [ ] Mind2Web-Live-SeeAct-V
|
| 44 |
-
- [ ] AndroidWorld-SeeAct-V
|
| 45 |
-
- [ ] Data-V1
|
| 46 |
-
- [ ] Data Examples
|
| 47 |
-
- [ ] Data Construction Scripts
|
| 48 |
-
- [ ] Guidance of Open-source Data
|
| 49 |
-
- [ ] Data-V1.1
|
| 50 |
- [x] Online Demo (HF Spaces)
|
| 51 |
|
| 52 |
|
| 53 |
-
|
| 54 |
## Main Results
|
| 55 |
|
| 56 |
### GUI Visual Grounding: ScreenSpot (Standard Setting)
|
|
|
|
| 18 |
|
| 19 |
## Models
|
| 20 |
|
| 21 |
+
- Model-V1:
|
| 22 |
+
- [Initial UGround](https://huggingface.co/osunlp/UGround):
|
| 23 |
+
- [UGround-V1-2B (Qwen2-VL)](https://huggingface.co/osunlp/UGround-V1-2B)
|
| 24 |
+
- [UGround-V1-7B (Qwen2-VL)](https://huggingface.co/osunlp/UGround-V1-7B)
|
| 25 |
+
- [UGround-V1-72B (Qwen2-VL)](https://huggingface.co/osunlp/UGround-V1-72B)
|
| 26 |
+
- [Training Data](https://huggingface.co/datasets/osunlp/UGround-V1-Data)
|
| 27 |
|
| 28 |
## Release Plan
|
| 29 |
|
| 30 |
+
- [x] [Model Weights](https://huggingface.co/collections/osunlp/uground-677824fc5823d21267bc9812)
|
| 31 |
+
- [x] Initial Version (the one used in the paper)
|
| 32 |
+
- [x] Qwen2-VL-Based V1 (2B, 7B, 72B)
|
| 33 |
+
- [x] Code
|
| 34 |
+
- [x] [Inference Code of UGround (Initial & Qwen2-VL-Based)](https://github.com/boyugou/llava_uground/)
|
| 35 |
+
- [x] Offline Experiments (Code, Results, and Useful Resources)
|
| 36 |
+
- [x] [ScreenSpot](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/ScreenSpot)
|
| 37 |
+
- [x] [Multimodal-Mind2Web](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/Multimodal-Mind2Web)
|
| 38 |
+
- [x] [OmniAct](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/OmniACT)
|
| 39 |
+
- [x] [Android Control](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/AndroidControl)
|
| 40 |
+
- [x] Online Experiments
|
| 41 |
+
- [x] [Mind2Web-Live-SeeAct-V](https://github.com/boyugou/Mind2Web_Live_SeeAct_V)
|
| 42 |
+
- [x] [AndroidWorld-SeeAct-V](https://github.com/boyugou/android_world_seeact_v)
|
| 43 |
+
- [ ] Data Synthesis Pipeline (Coming Soon)
|
| 44 |
+
- [x] [Training-Data (V1)](https://huggingface.co/datasets/osunlp/UGround-V1-Data)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
- [x] Online Demo (HF Spaces)
|
| 46 |
|
| 47 |
|
|
|
|
| 48 |
## Main Results
|
| 49 |
|
| 50 |
### GUI Visual Grounding: ScreenSpot (Standard Setting)
|