Instructions to use browser-use/bu-30b-a3b-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use browser-use/bu-30b-a3b-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="browser-use/bu-30b-a3b-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("browser-use/bu-30b-a3b-preview")
model = AutoModelForImageTextToText.from_pretrained("browser-use/bu-30b-a3b-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use browser-use/bu-30b-a3b-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "browser-use/bu-30b-a3b-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "browser-use/bu-30b-a3b-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/browser-use/bu-30b-a3b-preview

SGLang

How to use browser-use/bu-30b-a3b-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "browser-use/bu-30b-a3b-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "browser-use/bu-30b-a3b-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "browser-use/bu-30b-a3b-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "browser-use/bu-30b-a3b-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use browser-use/bu-30b-a3b-preview with Docker Model Runner:
```
docker model run hf.co/browser-use/bu-30b-a3b-preview
```

Japanese text output quality degradation compared to base model (Qwen3-VL-30B-A3B-Instruct)

by miminashi - opened Dec 25, 2025

Discussion

miminashi

Dec 25, 2025

•

edited Dec 25, 2025

Summary

I've identified significant Japanese text quality issues in bu-30b-a3b-preview when used with browser-use's system prompt and DOM state context. Through controlled experiments, I confirmed that the fine-tuned model shows severe degradation in Japanese output quality compared to the base model (Qwen3-VL-30B-A3B-Instruct) under browser-use operating conditions.

Environment

Inference: llama.cpp (llama-server)
Quantization: Q8_0 (via bartowski/browser-use_bu-30b-a3b-preview-GGUF)
Hardware: AMD Radeon Instinct MI25 x4 / NVIDIA Tesla P100 x4
Context size: 24576-32768 tokens
Version: Latest as of 2025-12-25

Experiment Design

I conducted three phases of testing to isolate the cause:

Phase 1: Direct API Requests (Baseline)

Simple Japanese text repetition tasks without any system prompt:

Task: Output "しぐれうい" (Shigure Ui) 5 times
Task: Output "文鳥と暮らしています" (I live with a Java sparrow) 5 times

Results: Both base model and bu-30b achieve 100% accuracy

Phase 2: Real-world Browser-Use Task

Web browsing task: Search "しぐれうい" on Google, visit 3 sites, extract summaries.
15 test runs with bu-30b-a3b-preview.

Results: Significant Japanese text corruption observed

Phase 3: Controlled Condition Tests

Added browser-use-specific conditions incrementally:

Exp1: Add browser-use system prompt only
Exp2: Add long context (~3,000 tokens of English text)
Exp3: Add DOM extraction text (Japanese Wikipedia-style content)

Results

Accuracy Comparison (5 runs × 5 repetitions = 25 samples per test)

Condition	Base Model (dakuten)	Base Model (kanji)	bu-30b (dakuten)	bu-30b (kanji)
Direct API	100%	100%	100%	100%
+ System Prompt	100%	92%	28%	28%
+ Long Context	92%	76%	0%	8%
+ DOM State	96%	88%	0%	0%

Summary Statistics

Model	Average Accuracy (Exp1-3)	Range
Base (Qwen3-VL-30B-A3B-Instruct)	90.7%	76-100%
bu-30b-a3b-preview	9.3%	0-28%

Error Patterns Observed

1. Dakuten (Voiced Consonant Mark) Errors

Expected: しぐれうい (Shigure Ui)

Actual Output	Frequency	Error Type
しくれうい	73%	Missing dakuten (ぐ→く)
しだれうい	7%	Wrong character (ぐ→だ)
しつれうい	7%	Wrong character (ぐ→つ)
しっくれうい	7%	Extra っ + missing dakuten

2. Kanji Conversion Errors

Expected: 文鳥と暮らしています (I live with a Java sparrow)

Actual Output	Error
文鳥と流らしています	暮→流 (wrong kanji)
文鳥と浦らしています	暮→浦 (wrong kanji)
文鳥とくらしています	暮→く (kanji to hiragana)
文鳥とらましています	暮ら→らま (corruption)

3. Completely Garbled Output (from real browser-use runs)

"222万人の動画を近期的に選み可能なテーンデートの中"
"イラストレーターと倉画家であるテーシメドの16歳（仮）でお経推きを動いでいる"

These are completely meaningless in Japanese.

Analysis

Key Finding

The base model (Qwen3-VL-30B-A3B-Instruct) maintains 76-100% accuracy across all conditions, while bu-30b-a3b-preview drops to 0-28% when browser-use system prompt and DOM context are added.

This strongly suggests that the fine-tuning process degraded the model's Japanese language capabilities, particularly when operating in the structured output format required by browser-use.

Possible Causes

Training data predominantly English: The fine-tuning dataset may have been mostly English browser automation examples
JSON output format interference: Training to output structured JSON may have disrupted Japanese token generation
Prompt sensitivity: The model may have become overly sensitive to specific prompt structures, causing instability in Japanese generation

Reproduction Steps

Load bu-30b-a3b-preview with llama.cpp
Use the browser-use system prompt:

You are a browser-use agent. You automate browser tasks by outputting structured JSON actions.
...

Add Japanese DOM content to the user message
Request Japanese text output
Compare with base model (Qwen3-VL-30B-A3B-Instruct) under same conditions

Requests

Could you confirm the language distribution of the fine-tuning dataset? Was Japanese (or other non-English languages) included?
Are there plans to improve multilingual support? Japanese is widely used in browser automation for Japanese websites.
Any recommended workarounds? For example:
- Using base model with custom prompts?
- Specific inference parameters that might help?

Appendix: Test Methodology

Server: llama-server with OpenAI-compatible API
Temperature: 0.7
Max tokens: 512
Test runs: 5 runs per condition, 5 repetitions per run = 25 samples
Evaluation: Exact string match counting

Thank you for developing this specialized browser automation model! I hope this feedback helps improve multilingual support in future versions.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment