OTA-AI
/

OTA-v1

Text Generation

browser_agent_model

text-generation-inference

Model card Files Files and versions

OTA-v1 / README.md

OTA-AI's picture

Update README.md

0f6ef8e verified 10 months ago

|

history blame contribute delete

3.28 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- browser_agent_model
	- tool_use
	library_name: transformers
	---

	# OTA-v1

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e3fe7cf70c00af963ef551/i5AmpDnraVFJj1WeDCLGO.png)

	## Introduction

	OTA-v1 is a specialized Browser Agent Model (BAM) fine-tuned from the Qwen2.5-14B base model. Designed to excel in controlling browser environments, OTA-v1 leverages frameworks like browser-use to perform automated browser tasks with high precision. Unlike traditional instruction-tuned models, OTA-v1 is optimized for reasoning and tool use within browser contexts, making it a powerful tool for web automation and interaction.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e3fe7cf70c00af963ef551/ol5hte7cpzt12y2Zx2PoG.png)


	## Features

	- Cost-Efficient Deployment:

	- Optimized for consumer-grade GPUs (NVIDIA 3090/4090) with 16-bit precision (20GB VRAM) and 4-bit quantization (10GB VRAM)
	- Enabling local execution without cloud dependencies

	- Multi-step Planning Engine:
	- Automatically decomposes complex tasks into executable action sequences
	- Implements conditional logic for error recovery and retry mechanisms
	- Maintains state awareness across browser sessions (tabs/windows)

	- Precision Tool Utilization:
	- Native support for browser agent frameworks (browser-use)
	- Automatic detection of interactive elements and form fields

	- Long-context Optimization:
	- Processes full-page DOM structures (up to 128K tokens)
	- YARN-enhanced attention patterns for efficient HTML traversal
	- Context-aware element resolution within dynamic web applications

	- Structured Execution: Generates battle-tested tool use instructions with:
	- Formatted tool use output under long context
	- Self correction based on previous action history


	## Quickstart

	Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "OTA-AI/OTA-v1"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Give me a short introduction to large language model."
	messages = [
	{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```

	## Citation

	If you find our work helpful, feel free to give us a cite.

	```
	@misc{OTA-v1,
	title = {OTA-v1: First Browser Agent Model},
	url = {https://huggingface.co/OTA-AI/OTA-v1/},
	author = {Shaoheng Wang, Jianyang Wu},
	month = {March},
	year = {2025}
	}
	```