Instructions to use ByteDance-Seed/UI-TARS-1.5-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ByteDance-Seed/UI-TARS-1.5-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ByteDance-Seed/UI-TARS-1.5-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ByteDance-Seed/UI-TARS-1.5-7B")
model = AutoModelForImageTextToText.from_pretrained("ByteDance-Seed/UI-TARS-1.5-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ByteDance-Seed/UI-TARS-1.5-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ByteDance-Seed/UI-TARS-1.5-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/UI-TARS-1.5-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ByteDance-Seed/UI-TARS-1.5-7B

SGLang

How to use ByteDance-Seed/UI-TARS-1.5-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ByteDance-Seed/UI-TARS-1.5-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/UI-TARS-1.5-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ByteDance-Seed/UI-TARS-1.5-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/UI-TARS-1.5-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ByteDance-Seed/UI-TARS-1.5-7B with Docker Model Runner:
```
docker model run hf.co/ByteDance-Seed/UI-TARS-1.5-7B
```

Error bbox locating

by wizkd - opened Apr 18, 2025

Discussion

wizkd

Apr 18, 2025

I use Midscene.js in web, the action is to click the serach box. but it click an error location. Is there any problem of coordinate mapping？

What's more, the result of test case in https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md is "

Thought: 我看到系统设置界面已经打开了，但这里显示的都是些基本的系统参数，比如缓存大小和内存使用情况。要设置图片的颜色模式，我得先找到"Color Management"这个选项。让我在左侧的设置列表中找找看，应该就在这些选项里面。
Action: click(start_box='(197,549)')"

which return an wrong box too.

houzhenhong

Apr 18, 2025

•

edited Apr 18, 2025

I seem to have the same problem, I have encountered inaccurate coordinates when using Midscene and UI-TARS-desktop

Perhaps this can solve the problem here.

tuananh7198

Apr 21, 2025

I tested their new code but seems the issues is still there, in OSWorld, the model seems to also click on the same location multiple times

abiteddie

Jun 2, 2025

Same for me. Also qwen2.5VL

JjjFangg

ByteDance Seed org Jun 3, 2025

For the coordinate conversion issue, please refer to this tutorial.
Regarding the stuck problem, we have indeed observed it on the 7B model. We plan to release the full UI-TARS-1.5 model in the future, which will include significant improvements to this issue. Stay tuned!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment