Instructions to use prithivMLmods/Holo3-35B-A3B-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Holo3-35B-A3B-FP8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="prithivMLmods/Holo3-35B-A3B-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("prithivMLmods/Holo3-35B-A3B-FP8")
model = AutoModelForImageTextToText.from_pretrained("prithivMLmods/Holo3-35B-A3B-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Holo3-35B-A3B-FP8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Holo3-35B-A3B-FP8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Holo3-35B-A3B-FP8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Holo3-35B-A3B-FP8

SGLang

How to use prithivMLmods/Holo3-35B-A3B-FP8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Holo3-35B-A3B-FP8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Holo3-35B-A3B-FP8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Holo3-35B-A3B-FP8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Holo3-35B-A3B-FP8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Holo3-35B-A3B-FP8 with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Holo3-35B-A3B-FP8
```

Holo3-35B-A3B-FP8

Holo3-35B-A3B-FP8 is an FP8-compressed evolution of Holo3-35B-A3B. This variant leverages F32 · F8_E4M3 precision formats to significantly reduce memory footprint and improve inference efficiency while maintaining strong output quality. Holo3-35B-A3B from H Company is a state-of-the-art sparse Mixture-of-Experts (MoE) vision-language model with 35B total parameters but only 3B active per inference, fine-tuned from Qwen3.5-35B-A3B for GUI navigation and computer-use agents operating across web, desktop, and mobile environments. Achieving SOTA 77.8% on the OSWorld-Verified benchmark — surpassing proprietary models at dramatically lower cost and latency — it excels at interpreting visual interfaces, reasoning over complex content, and executing precise actions such as form filling, spreadsheet editing, and browser control for enterprise automation. It is Apache 2.0-licensed with open weights available on Hugging Face.

Recipe

default_stage:
  default_modifiers:
    QuantizationModifier:
      targets: [Linear]
      ignore: ['re:.*lm_head', 're:visual.*', 're:model.visual.*', 're:.*mlp.gate$', 're:.*embed_tokens$',
        're:.*shared_expert_gate$', 're:.*linear_attn.*']
      scheme: FP8_DYNAMIC
      bypass_divisibility_checks: false

Holo3: Foundational Models for Navigation and Computer Use Agents

Model Description

Holo3 is our latest generation of large-scale Vision-Language Models (VLMs) specifically optimized for GUI Agents. Like its predecessors, it operates across diverse digital environments—web, desktop, and mobile—by interpreting visual interfaces, reasoning over complex content, and executing precise actions.

Holo3 achieves state-of-the-art performance on OSWorld-Verified, setting a new benchmark for computer use agents. While it retains the world-class web navigation capabilities of Holo2, the new Holo3-35B-A3B architecture is designed to thrive in realistic business environments.

Developed by: H Company
Model type: Vision-Language Model for Navigation and Computer Use Agents
Architecture: Sparse Mixture-of-Experts (MoE) with 35B total / 3B active parameters
Fine-tuned from model: Qwen/Qwen3.5-35B-A3B
Blog Post: hcompany.ai/holo3
Quickstart: hub.hcompany.ai/quickstart
License: Apache 2.0 License

Get Started

Explore our Quickstart guide to learn how to integrate with our inference API.

Training Strategy

Holo3-35B-A3B is based on the Qwen3.5 architecture and has been reinforced to strengthen its core agentic pillars: perception and decision-making. The training pipeline utilizes a carefully curated mix of open-source datasets, large-scale synthetic trajectories, and high-quality human-annotated samples to ensure reliable multi-step reasoning.

Results

State-of-the-Art Navigation (OSWorld-Verified)

To benchmark Holo3 on computer use and web navigation, we utilized the OSWorld and WebArena benchmarks. Holo3-35B-A3B achieves a 77.8% score on OSWorld-Verified. Remarkably, it achieves this with only 3B active parameters, providing SOTA performance at a fraction of the inference cost of leading proprietary models.

Enterprise Readiness (H Corporate Benchmark)

To measure real-world utility, we developed the H Corporate Benchmark: a dedicated evaluation suite of 486 multi-step tasks across four categories: E-commerce, Business Software, Collaboration, and Multi-App workflows. Holo3 consistently outperforms significantly larger competitors in these dense, business-logic environments.

UI Localization & Grounding

A world-class agent must see before it can act. Holo3 excels at localizing interaction elements and understanding their functions, as evidenced by top-tier performance on ScreenSpot-Pro and OSWorld-G.

Table 1: Evaluation results on computer use and grounding benchmarks.

Citation

@misc{hai2025holo3modelfamily,
      title={Holo3 - Open Foundation Models for Navigation and Computer Use Agents},
      author={H Company},
      year={2026},
      url={https://huggingface.co/Hcompany/Holo3-35B-A3B},
}