|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: ByteDance-Seed/UI-TARS-1.5-7B |
|
|
tags: |
|
|
- vision |
|
|
- web-agents |
|
|
- browser-automation |
|
|
- websight |
|
|
library_name: transformers |
|
|
pipeline_tag: image-text-to-text |
|
|
--- |
|
|
|
|
|
# Websight-7B (Merged) |
|
|
|
|
|
This is a merged version of the Websight-7B model, ready for deployment and inference. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: ByteDance-Seed/UI-TARS-1.5-7B |
|
|
- **Source PEFT Model**: Asanshay/websight-7B (previous model saved here) |
|
|
- **Model Type**: Vision-Language Model for Web Agent Tasks |
|
|
- **License**: Apache 2.0 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load the model |
|
|
pipe = pipeline("image-text-to-text", model="tanvirb/websight-7B") |
|
|
|
|
|
# Use for web agent tasks |
|
|
result = pipe(text="Click the login button", images=[screenshot]) |
|
|
``` |
|
|
|
|
|
## Deployment |
|
|
|
|
|
This model is ready for: |
|
|
- Hugging Face Inference Endpoints |
|
|
- Local inference |
|
|
- Integration with web automation pipelines |
|
|
|
|
|
## Training |
|
|
|
|
|
This model was fine-tuned using PEFT (Parameter Efficient Fine-Tuning) techniques on web interaction data. |
|
|
|