ByteDance-Seed
/

UI-TARS-7B-SFT

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

MingComplex commited on Jan 21, 2025

Commit

0fe41f7

·

1 Parent(s): 58730a8

update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -17,11 +17,12 @@ library_name: transformers
 UI-TARS is a next-generation native GUI agent model designed to interact seamlessly with graphical user interfaces (GUIs) using human-like perception, reasoning, and action capabilities. Unlike traditional modular frameworks, UI-TARS integrates all key components—perception, reasoning, grounding, and memory—within a single vision-language model (VLM), enabling end-to-end task automation without predefined workflows or manual rules.
 <!-- ![Local Image](figures/UI-TARS.png) -->
 <p align="center">
-    <img src="https://github.com/bytedance/UI-TARS/blob/main/figures/UI-TARS.png?raw=true" width="80%"/>
 <p>
 <p align="center">
-    <img src="https://github.com/bytedance/UI-TARS/blob/main/figures/UI-TARS-vs-Previous-SOTA.png?raw=true" width="80%"/>
 <p>
 <!-- ![Local Image](figures/UI-TARS-vs-Previous-SOTA.png) -->
 ## Core Features

 UI-TARS is a next-generation native GUI agent model designed to interact seamlessly with graphical user interfaces (GUIs) using human-like perception, reasoning, and action capabilities. Unlike traditional modular frameworks, UI-TARS integrates all key components—perception, reasoning, grounding, and memory—within a single vision-language model (VLM), enabling end-to-end task automation without predefined workflows or manual rules.
 <!-- ![Local Image](figures/UI-TARS.png) -->
 <p align="center">
+    <img src="https://github.com/bytedance/UI-TARS/blob/main/figures/UI-TARS-vs-Previous-SOTA.png?raw=true" width="90%"/>
 <p>
 <p align="center">
+    <img src="https://github.com/bytedance/UI-TARS/blob/main/figures/UI-TARS.png?raw=true" width="90%"/>
 <p>
 <!-- ![Local Image](figures/UI-TARS-vs-Previous-SOTA.png) -->
 ## Core Features