---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- browser_agent_model
- tool_use
library_name: transformers
---

# OTA-v1

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e3fe7cf70c00af963ef551/i5AmpDnraVFJj1WeDCLGO.png)

## Introduction

OTA-v1 is a specialized Browser Agent Model (BAM) fine-tuned from the Qwen2.5-14B base model. Designed to excel in controlling browser environments, OTA-v1 leverages frameworks like browser-use to perform automated browser tasks with high precision. Unlike traditional instruction-tuned models, OTA-v1 is optimized for reasoning and tool use within browser contexts, making it a powerful tool for web automation and interaction.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e3fe7cf70c00af963ef551/ol5hte7cpzt12y2Zx2PoG.png)


## Features

- **Cost-Efficient Deployment:**

  - Optimized for consumer-grade GPUs (NVIDIA 3090/4090) with 16-bit precision (20GB VRAM) and 4-bit quantization (10GB VRAM)
  - Enabling local execution without cloud dependencies

- **Multi-step Planning Engine:**
  - Automatically decomposes complex tasks into executable action sequences
  - Implements conditional logic for error recovery and retry mechanisms
  - Maintains state awareness across browser sessions (tabs/windows)

- **Precision Tool Utilization:**
  - Native support for browser agent frameworks (browser-use)
  - Automatic detection of interactive elements and form fields

- **Long-context Optimization:**
  - Processes full-page DOM structures (up to 128K tokens)
  - YARN-enhanced attention patterns for efficient HTML traversal
  - Context-aware element resolution within dynamic web applications

- **Structured Execution:** Generates battle-tested tool use instructions with:
  - Formatted tool use output under long context
  - Self correction based on previous action history


## Quickstart

Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "OTA-AI/OTA-v1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## Citation

If you find our work helpful, feel free to give us a cite.

```
@misc{OTA-v1,
    title = {OTA-v1: First Browser Agent Model},
    url = {https://huggingface.co/OTA-AI/OTA-v1/},
    author = {Shaoheng Wang, Jianyang Wu},
    month = {March},
    year = {2025}
}
```