websight-7B / README.md
tanvirb's picture
bring back readme
0559a8f
metadata
license: apache-2.0
base_model: ByteDance-Seed/UI-TARS-1.5-7B
tags:
  - vision
  - web-agents
  - browser-automation
  - websight
library_name: transformers
pipeline_tag: image-text-to-text

Websight-7B (Merged)

This is a merged version of the Websight-7B model, ready for deployment and inference.

Model Details

  • Base Model: ByteDance-Seed/UI-TARS-1.5-7B
  • Source PEFT Model: Asanshay/websight-7B (previous model saved here)
  • Model Type: Vision-Language Model for Web Agent Tasks
  • License: Apache 2.0

Usage

from transformers import pipeline

# Load the model
pipe = pipeline("image-text-to-text", model="tanvirb/websight-7B")

# Use for web agent tasks
result = pipe(text="Click the login button", images=[screenshot])

Deployment

This model is ready for:

  • Hugging Face Inference Endpoints
  • Local inference
  • Integration with web automation pipelines

Training

This model was fine-tuned using PEFT (Parameter Efficient Fine-Tuning) techniques on web interaction data.