|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- browser_agent_model |
|
|
- tool_use |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# OTA-v1 |
|
|
|
|
|
 |
|
|
|
|
|
## Introduction |
|
|
|
|
|
OTA-v1 is a specialized Browser Agent Model (BAM) fine-tuned from the Qwen2.5-14B base model. Designed to excel in controlling browser environments, OTA-v1 leverages frameworks like browser-use to perform automated browser tasks with high precision. Unlike traditional instruction-tuned models, OTA-v1 is optimized for reasoning and tool use within browser contexts, making it a powerful tool for web automation and interaction. |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
## Features |
|
|
|
|
|
- **Cost-Efficient Deployment:** |
|
|
|
|
|
- Optimized for consumer-grade GPUs (NVIDIA 3090/4090) with 16-bit precision (20GB VRAM) and 4-bit quantization (10GB VRAM) |
|
|
- Enabling local execution without cloud dependencies |
|
|
|
|
|
- **Multi-step Planning Engine:** |
|
|
- Automatically decomposes complex tasks into executable action sequences |
|
|
- Implements conditional logic for error recovery and retry mechanisms |
|
|
- Maintains state awareness across browser sessions (tabs/windows) |
|
|
|
|
|
- **Precision Tool Utilization:** |
|
|
- Native support for browser agent frameworks (browser-use) |
|
|
- Automatic detection of interactive elements and form fields |
|
|
|
|
|
- **Long-context Optimization:** |
|
|
- Processes full-page DOM structures (up to 128K tokens) |
|
|
- YARN-enhanced attention patterns for efficient HTML traversal |
|
|
- Context-aware element resolution within dynamic web applications |
|
|
|
|
|
- **Structured Execution:** Generates battle-tested tool use instructions with: |
|
|
- Formatted tool use output under long context |
|
|
- Self correction based on previous action history |
|
|
|
|
|
|
|
|
## Quickstart |
|
|
|
|
|
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents. |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_name = "OTA-AI/OTA-v1" |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype="auto", |
|
|
device_map="auto" |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
prompt = "Give me a short introduction to large language model." |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, |
|
|
{"role": "user", "content": prompt} |
|
|
] |
|
|
text = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True |
|
|
) |
|
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
|
|
|
|
generated_ids = model.generate( |
|
|
**model_inputs, |
|
|
max_new_tokens=512 |
|
|
) |
|
|
generated_ids = [ |
|
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
|
] |
|
|
|
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find our work helpful, feel free to give us a cite. |
|
|
|
|
|
``` |
|
|
@misc{OTA-v1, |
|
|
title = {OTA-v1: First Browser Agent Model}, |
|
|
url = {https://huggingface.co/OTA-AI/OTA-v1/}, |
|
|
author = {Shaoheng Wang, Jianyang Wu}, |
|
|
month = {March}, |
|
|
year = {2025} |
|
|
} |
|
|
``` |