tanvirb
/

websight-7B

Image-Text-to-Text

browser-automation

text-generation-inference

Model card Files Files and versions

websight-7B / README.md

tanvirb's picture

bring back readme

0559a8f 5 months ago

|

history blame contribute delete

1.03 kB

	---
	license: apache-2.0
	base_model: ByteDance-Seed/UI-TARS-1.5-7B
	tags:
	- vision
	- web-agents
	- browser-automation
	- websight
	library_name: transformers
	pipeline_tag: image-text-to-text
	---

	# Websight-7B (Merged)

	This is a merged version of the Websight-7B model, ready for deployment and inference.

	## Model Details

	- Base Model: ByteDance-Seed/UI-TARS-1.5-7B
	- Source PEFT Model: Asanshay/websight-7B (previous model saved here)
	- Model Type: Vision-Language Model for Web Agent Tasks
	- License: Apache 2.0

	## Usage

	```python
	from transformers import pipeline

	# Load the model
	pipe = pipeline("image-text-to-text", model="tanvirb/websight-7B")

	# Use for web agent tasks
	result = pipe(text="Click the login button", images=[screenshot])
	```

	## Deployment

	This model is ready for:
	- Hugging Face Inference Endpoints
	- Local inference
	- Integration with web automation pipelines

	## Training

	This model was fine-tuned using PEFT (Parameter Efficient Fine-Tuning) techniques on web interaction data.