Update README.md
Browse files
README.md
CHANGED
|
@@ -17,15 +17,15 @@ tags:
|
|
| 17 |
|
| 18 |
## Model Description
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
-
As part of a broader agentic architecture,
|
| 23 |
|
| 24 |
-
Trained on a mix of open-access, synthetic, and self-generated data,
|
| 25 |
It also excels in UI localization tasks such as [Screenspot](https://huggingface.co/datasets/rootsautomation/ScreenSpot), [Screenspot-V2](https://huggingface.co/datasets/HongxinLi/ScreenSpot_v2), [Screenspot-Pro](https://huggingface.co/datasets/likaixin/ScreenSpot-Pro), [GroundUI-Web](https://huggingface.co/datasets/agent-studio/GroundUI-1K), and our own newly introduced
|
| 26 |
benchmark [WebClick](https://huggingface.co/datasets/Hcompany/WebClick).
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
For more details, check our paper and our blog post.
|
| 31 |
|
|
@@ -86,7 +86,7 @@ We also provide code to reproduce screenspot evaluations: screenspot_eval.py
|
|
| 86 |
|
| 87 |
### Prepare model, processor
|
| 88 |
|
| 89 |
-
|
| 90 |
You can load the model and the processor as follows:
|
| 91 |
|
| 92 |
```python
|
|
|
|
| 17 |
|
| 18 |
## Model Description
|
| 19 |
|
| 20 |
+
Holo1 is an Action Vision-Language Model (VLM) developed by [HCompany](https://www.hcompany.ai/) for use in the Surfer-H web agent system. It is designed to interact with web interfaces like a human user.
|
| 21 |
|
| 22 |
+
As part of a broader agentic architecture, Holo1 acts as a policy, localizer, or validator, helping the agent understand and act in digital environments.
|
| 23 |
|
| 24 |
+
Trained on a mix of open-access, synthetic, and self-generated data, Holo1 enables state-of-the-art (SOTA) performance on the [WebVoyager](https://arxiv.org/pdf/2401.13919) benchmark, offering the best accuracy/cost tradeoff among current models.
|
| 25 |
It also excels in UI localization tasks such as [Screenspot](https://huggingface.co/datasets/rootsautomation/ScreenSpot), [Screenspot-V2](https://huggingface.co/datasets/HongxinLi/ScreenSpot_v2), [Screenspot-Pro](https://huggingface.co/datasets/likaixin/ScreenSpot-Pro), [GroundUI-Web](https://huggingface.co/datasets/agent-studio/GroundUI-1K), and our own newly introduced
|
| 26 |
benchmark [WebClick](https://huggingface.co/datasets/Hcompany/WebClick).
|
| 27 |
|
| 28 |
+
Holo1 is optimized for both accuracy and cost-efficiency, making it a strong open-source alternative to existing VLMs.
|
| 29 |
|
| 30 |
For more details, check our paper and our blog post.
|
| 31 |
|
|
|
|
| 86 |
|
| 87 |
### Prepare model, processor
|
| 88 |
|
| 89 |
+
Holo1 models are based on Qwen2.5-VL architecture, which comes with transformers support. Here we provide a simple usage example.
|
| 90 |
You can load the model and the processor as follows:
|
| 91 |
|
| 92 |
```python
|