Update README.md
Browse files
README.md
CHANGED
|
@@ -12,9 +12,9 @@ base_model:
|
|
| 12 |
|
| 13 |
# PTA-1: Controlling Computers with Small Models
|
| 14 |
|
| 15 |
-
PTA (Prompt-to-Automation) is a vision language model for computer
|
| 16 |
-
With
|
| 17 |
-
This
|
| 18 |
|
| 19 |
**Model Input:** Screenshot + description_of_target_element
|
| 20 |
|
|
@@ -62,8 +62,8 @@ print(parsed_answer)
|
|
| 62 |
|
| 63 |
## Evaluation
|
| 64 |
|
| 65 |
-
**Note:** This is a first version of our evaluation
|
| 66 |
-
We are still running all models on the full test sets
|
| 67 |
|
| 68 |
| Model | Parameters | Mean | agentsea/wave-ui | AskUI/pta-text | ivelin/rico_refexp_combined |
|
| 69 |
|--------------------------------------------|------------|--------|------------------|----------------|-----------------------------|
|
|
@@ -83,10 +83,10 @@ We are still running all models on the full test sets. We are seeing +-5% deviat
|
|
| 83 |
\* Models is known to be trained on the train split of that dataset.
|
| 84 |
|
| 85 |
The high benchmark scores for our model are partially due to data bias.
|
| 86 |
-
Therefore we expect users of the model to fine-tune it according to the data distributions of their use case.
|
| 87 |
|
| 88 |
|
| 89 |
#### Metrics
|
| 90 |
|
| 91 |
-
Click success rate is calculated as the number of clicks inside the target bounding box.
|
| 92 |
If a model predicts a target bounding box instead of a click coordinate, its center is used as its click prediction.
|
|
|
|
| 12 |
|
| 13 |
# PTA-1: Controlling Computers with Small Models
|
| 14 |
|
| 15 |
+
PTA (Prompt-to-Automation) is a vision language model for computer & phone automation, based on Florence-2.
|
| 16 |
+
With only 270M parameters it outperforms much larger models in GUI text and element localization.
|
| 17 |
+
This enables low-latency computer automation with local execution.
|
| 18 |
|
| 19 |
**Model Input:** Screenshot + description_of_target_element
|
| 20 |
|
|
|
|
| 62 |
|
| 63 |
## Evaluation
|
| 64 |
|
| 65 |
+
**Note:** This is a first version of our evaluation, based on 999 samples (333 samples from each dataset).
|
| 66 |
+
We are still running all models on the full test sets, and we are seeing ±5% deviations for a subset of the models we have already evaluated.
|
| 67 |
|
| 68 |
| Model | Parameters | Mean | agentsea/wave-ui | AskUI/pta-text | ivelin/rico_refexp_combined |
|
| 69 |
|--------------------------------------------|------------|--------|------------------|----------------|-----------------------------|
|
|
|
|
| 83 |
\* Models is known to be trained on the train split of that dataset.
|
| 84 |
|
| 85 |
The high benchmark scores for our model are partially due to data bias.
|
| 86 |
+
Therefore, we expect users of the model to fine-tune it according to the data distributions of their use case.
|
| 87 |
|
| 88 |
|
| 89 |
#### Metrics
|
| 90 |
|
| 91 |
+
Click success rate is calculated as the number of clicks inside the target bounding box relative to all clicks.
|
| 92 |
If a model predicts a target bounding box instead of a click coordinate, its center is used as its click prediction.
|