--- license: apache-2.0 language: - en pipeline_tag: text-generation tags: - browser_agent_model - tool_use library_name: transformers --- # OTA-v1 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e3fe7cf70c00af963ef551/i5AmpDnraVFJj1WeDCLGO.png) ## Introduction OTA-v1 is a specialized Browser Agent Model (BAM) fine-tuned from the Qwen2.5-14B base model. Designed to excel in controlling browser environments, OTA-v1 leverages frameworks like browser-use to perform automated browser tasks with high precision. Unlike traditional instruction-tuned models, OTA-v1 is optimized for reasoning and tool use within browser contexts, making it a powerful tool for web automation and interaction. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e3fe7cf70c00af963ef551/ol5hte7cpzt12y2Zx2PoG.png) ## Features - **Cost-Efficient Deployment:** - Optimized for consumer-grade GPUs (NVIDIA 3090/4090) with 16-bit precision (20GB VRAM) and 4-bit quantization (10GB VRAM) - Enabling local execution without cloud dependencies - **Multi-step Planning Engine:** - Automatically decomposes complex tasks into executable action sequences - Implements conditional logic for error recovery and retry mechanisms - Maintains state awareness across browser sessions (tabs/windows) - **Precision Tool Utilization:** - Native support for browser agent frameworks (browser-use) - Automatic detection of interactive elements and form fields - **Long-context Optimization:** - Processes full-page DOM structures (up to 128K tokens) - YARN-enhanced attention patterns for efficient HTML traversal - Context-aware element resolution within dynamic web applications - **Structured Execution:** Generates battle-tested tool use instructions with: - Formatted tool use output under long context - Self correction based on previous action history ## Quickstart Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "OTA-AI/OTA-v1" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Citation If you find our work helpful, feel free to give us a cite. ``` @misc{OTA-v1, title = {OTA-v1: First Browser Agent Model}, url = {https://huggingface.co/OTA-AI/OTA-v1/}, author = {Shaoheng Wang, Jianyang Wu}, month = {March}, year = {2025} } ```