Hcompany
/

Holo1-3B

@@ -17,15 +17,15 @@ tags:
 ## Model Description
-Holo-1 is an Action Vision-Language Model (VLM) developed by [HCompany](https://www.hcompany.ai/) for use in the Surfer-H web agent system. It is designed to interact with web interfaces like a human user.
-As part of a broader agentic architecture, Holo-1 acts as a policy, localizer, or validator, helping the agent understand and act in digital environments.
-Trained on a mix of open-access, synthetic, and self-generated data, Holo-1 enables state-of-the-art (SOTA) performance on the [WebVoyager](https://arxiv.org/pdf/2401.13919) benchmark, offering the best accuracy/cost tradeoff among current models.
 It also excels in UI localization tasks such as [Screenspot](https://huggingface.co/datasets/rootsautomation/ScreenSpot), [Screenspot-V2](https://huggingface.co/datasets/HongxinLi/ScreenSpot_v2), [Screenspot-Pro](https://huggingface.co/datasets/likaixin/ScreenSpot-Pro), [GroundUI-Web](https://huggingface.co/datasets/agent-studio/GroundUI-1K), and our own newly introduced
 benchmark [WebClick](https://huggingface.co/datasets/Hcompany/WebClick).
-Holo-1 is optimized for both accuracy and cost-efficiency, making it a strong open-source alternative to existing VLMs.
 For more details, check our paper and our blog post.
@@ -86,7 +86,7 @@ We also provide code to reproduce screenspot evaluations: screenspot_eval.py
 ### Prepare model, processor
-Holo-1 models are based on Qwen2.5-VL architecture, which comes with transformers support. Here we provide a simple usage example.
 You can load the model and the processor as follows:
 ```python

 ## Model Description
+Holo1 is an Action Vision-Language Model (VLM) developed by [HCompany](https://www.hcompany.ai/) for use in the Surfer-H web agent system. It is designed to interact with web interfaces like a human user.
+As part of a broader agentic architecture, Holo1 acts as a policy, localizer, or validator, helping the agent understand and act in digital environments.
+Trained on a mix of open-access, synthetic, and self-generated data, Holo1 enables state-of-the-art (SOTA) performance on the [WebVoyager](https://arxiv.org/pdf/2401.13919) benchmark, offering the best accuracy/cost tradeoff among current models.
 It also excels in UI localization tasks such as [Screenspot](https://huggingface.co/datasets/rootsautomation/ScreenSpot), [Screenspot-V2](https://huggingface.co/datasets/HongxinLi/ScreenSpot_v2), [Screenspot-Pro](https://huggingface.co/datasets/likaixin/ScreenSpot-Pro), [GroundUI-Web](https://huggingface.co/datasets/agent-studio/GroundUI-1K), and our own newly introduced
 benchmark [WebClick](https://huggingface.co/datasets/Hcompany/WebClick).
+Holo1 is optimized for both accuracy and cost-efficiency, making it a strong open-source alternative to existing VLMs.
 For more details, check our paper and our blog post.
 ### Prepare model, processor
+Holo1 models are based on Qwen2.5-VL architecture, which comes with transformers support. Here we provide a simple usage example.
 You can load the model and the processor as follows:
 ```python