Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf OTA-AI/OTA-v1:Q8_0# Run inference directly in the terminal:
llama-cli -hf OTA-AI/OTA-v1:Q8_0Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf OTA-AI/OTA-v1:Q8_0# Run inference directly in the terminal:
./llama-cli -hf OTA-AI/OTA-v1:Q8_0Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf OTA-AI/OTA-v1:Q8_0# Run inference directly in the terminal:
./build/bin/llama-cli -hf OTA-AI/OTA-v1:Q8_0Use Docker
docker model run hf.co/OTA-AI/OTA-v1:Q8_0OTA-v1
Introduction
OTA-v1 is a specialized Browser Agent Model (BAM) fine-tuned from the Qwen2.5-14B base model. Designed to excel in controlling browser environments, OTA-v1 leverages frameworks like browser-use to perform automated browser tasks with high precision. Unlike traditional instruction-tuned models, OTA-v1 is optimized for reasoning and tool use within browser contexts, making it a powerful tool for web automation and interaction.
Features
Cost-Efficient Deployment:
- Optimized for consumer-grade GPUs (NVIDIA 3090/4090) with 16-bit precision (20GB VRAM) and 4-bit quantization (10GB VRAM)
- Enabling local execution without cloud dependencies
Multi-step Planning Engine:
- Automatically decomposes complex tasks into executable action sequences
- Implements conditional logic for error recovery and retry mechanisms
- Maintains state awareness across browser sessions (tabs/windows)
Precision Tool Utilization:
- Native support for browser agent frameworks (browser-use)
- Automatic detection of interactive elements and form fields
Long-context Optimization:
- Processes full-page DOM structures (up to 128K tokens)
- YARN-enhanced attention patterns for efficient HTML traversal
- Context-aware element resolution within dynamic web applications
Structured Execution: Generates battle-tested tool use instructions with:
- Formatted tool use output under long context
- Self correction based on previous action history
Quickstart
Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "OTA-AI/OTA-v1"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Citation
If you find our work helpful, feel free to give us a cite.
@misc{OTA-v1,
title = {OTA-v1: First Browser Agent Model},
url = {https://huggingface.co/OTA-AI/OTA-v1/},
author = {Shaoheng Wang, Jianyang Wu},
month = {March},
year = {2025}
}
- Downloads last month
- 26


Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf OTA-AI/OTA-v1:Q8_0# Run inference directly in the terminal: llama-cli -hf OTA-AI/OTA-v1:Q8_0