Instructions to use codefuse-ai/OpAgent-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use codefuse-ai/OpAgent-32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="codefuse-ai/OpAgent-32B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("codefuse-ai/OpAgent-32B")
model = AutoModelForImageTextToText.from_pretrained("codefuse-ai/OpAgent-32B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use codefuse-ai/OpAgent-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "codefuse-ai/OpAgent-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codefuse-ai/OpAgent-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/codefuse-ai/OpAgent-32B

SGLang

How to use codefuse-ai/OpAgent-32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "codefuse-ai/OpAgent-32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codefuse-ai/OpAgent-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "codefuse-ai/OpAgent-32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codefuse-ai/OpAgent-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use codefuse-ai/OpAgent-32B with Docker Model Runner:
```
docker model run hf.co/codefuse-ai/OpAgent-32B
```

exias commited on Feb 13

Commit

6ce04b3

verified ·

1 Parent(s): d9eca55

Update README.md

Browse files

Files changed (1) hide show

README.md +118 -3

README.md CHANGED Viewed

@@ -1,3 +1,118 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- Qwen/Qwen3-VL-32B-Thinking
+library_name: transformers
+tags:
+- vlm
+- web-agent
+- opagent
+---
+# OpAgent-32B
+<!-- Provide a quick summary of what the model is/does. -->
+**OpAgent-32B** is a powerful, open-source Vision-Language Model (VLM) specifically fine-tuned for autonomous web navigation. It serves as the core single-model engine within the broader **[OpAgent project](https://github.com/codefuse-ai/OpAgent)**.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Base Model:** `Qwen3-VL-32B-Thinking`
+- **Fine-tuning Strategy:** Hierarchical Multi-Task SFT followed by Online Agentic RL with a Hybrid Reward mechanism.
+- **Primary Task:** Autonomous web navigation and task execution.
+- **Input:** A combination of a natural language task description and a webpage screenshot.
+- **Output:** A JSON-formatted action (e.g., `click`, `type`, `scroll`) or a final answer.
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [https://github.com/codefuse-ai/OpAgent]
+## Uses
+This model is designed to be used as a web agent. The primary way to run it is through a high-performance inference engine like **vLLM**, as demonstrated in our [single-model usage guide](https://github.com/codefuse-ai/OpAgent/tree/main/opagent_single_model).
+Below is a Python code snippet demonstrating how to use `OpAgent-32B` with `vLLM` for a single-step inference.
+```python
+import base64
+from vllm import LLM, SamplingParams
+from PIL import Image
+from io import BytesIO
+# --- 1. Helper function to encode image ---
+def encode_image_to_base64(image_path):
+    with Image.open(image_path) as img:
+        buffered = BytesIO()
+        img.save(buffered, format="PNG")
+        return base64.b64encode(buffered.getvalue()).decode('utf-8')
+# --- 2. Initialize the vLLM engine ---
+# Ensure you have enough GPU memory.
+model_id = "codefuse-ai/OpAgent-32B"
+llm = LLM(
+    model=model_id,
+    trust_remote_code=True,
+    tensor_parallel_size=1,  # Adjust based on your GPU setup
+    gpu_memory_utilization=0.9
+)
+# --- 3. Prepare the prompt ---
+# The prompt must include the system message, task description, and the screenshot.
+task_description = "Search for wireless headphones under $50"
+screenshot_path = "path/to/your/screenshot.png" # Replace with your screenshot path
+base64_image = encode_image_to_base64(screenshot_path)
+# This prompt format is crucial for the agent's performance
+prompt = f"""system
+You are a helpful web agent. Your goal is to perform tasks on a web page based on a screenshot and a user's instruction.
+Output the thinking process in <think> </think> tags, and for each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:\n<think> ... </think><tool_call>{"name": <function-name>, "arguments": <args-json-object>}</tool_call>.
+user
+[SCREENSHOT]
+Task: {task_description}
+assistant
+"""
+# --- 4. Generate the action ---
+sampling_params = SamplingParams(temperature=0.0, max_tokens=1024)
+# The model expects the image to be passed via the `images` parameter
+outputs = llm.generate(
+    prompts=[prompt],
+    sampling_params=sampling_params,
+    images=[base64_image]
+)
+# --- 5. Print the result ---
+for output in outputs:
+    generated_text = output.outputs[0].text
+    print("--- Generated Action ---")
+    print(generated_text)
+```
+For a complete, interactive agent implementation, please see the code in the [`opagent_single_model`](https://github.com/codefuse-ai/OpAgent/tree/main/opagent_single_model) directory of our repository.
+## Citation
+If you use `OpAgent-32B` or the `OAgent` framework in your research, please cite our work:
+```bibtex
+@misc{opagent2026,
+  author = {CodeFuse-AI Team},
+  title = {OpAgent: Operator Agent for Web Navigation},
+  year = {2026},
+  publisher = {GitHub},
+  howpublished = {\url{https://github.com/codefuse-ai/OpAgent}},
+  url = {https://github.com/codefuse-ai/OpAgent}
+}