How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "codefuse-ai/OpAgent-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codefuse-ai/OpAgent-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker
docker model run hf.co/codefuse-ai/OpAgent-32B
Quick Links

OpAgent-32B

OpAgent-32B is a powerful, open-source Vision-Language Model (VLM) specifically fine-tuned for autonomous web navigation. It serves as the core single-model engine within the broader OpAgent project.

Model Details

Model Description

  • Base Model: Qwen3-VL-32B-Thinking
  • Fine-tuning Strategy: Hierarchical Multi-Task SFT followed by Online Agentic RL with a Hybrid Reward mechanism.
  • Primary Task: Autonomous web navigation and task execution.
  • Input: A combination of a natural language task description and a webpage screenshot.
  • Output: A JSON-formatted action (e.g., click, type, scroll) or a final answer.

Model Sources [optional]

Uses

This model is designed to be used as a web agent. The primary way to run it is through a high-performance inference engine like vLLM, as demonstrated in our single-model usage guide.

Below is a Python code snippet demonstrating how to use OpAgent-32B with vLLM for a single-step inference.

import base64
from vllm import LLM, SamplingParams
from PIL import Image
from io import BytesIO

# --- 1. Helper function to encode image ---
def encode_image_to_base64(image_path):
    with Image.open(image_path) as img:
        buffered = BytesIO()
        img.save(buffered, format="PNG")
        return base64.b64encode(buffered.getvalue()).decode('utf-8')

# --- 2. Initialize the vLLM engine ---
# Ensure you have enough GPU memory.
model_id = "codefuse-ai/OpAgent-32B"
llm = LLM(
    model=model_id,
    trust_remote_code=True,
    tensor_parallel_size=1,  # Adjust based on your GPU setup
    gpu_memory_utilization=0.9
)

# --- 3. Prepare the prompt ---
# The prompt must include the system message, task description, and the screenshot.
task_description = "Search for wireless headphones under $50"
screenshot_path = "path/to/your/screenshot.png" # Replace with your screenshot path
base64_image = encode_image_to_base64(screenshot_path)

# This prompt format is crucial for the agent's performance
prompt = f"""system
You are a helpful web agent. Your goal is to perform tasks on a web page based on a screenshot and a user's instruction.
Output the thinking process in <think> </think> tags, and for each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:\n<think> ... </think><tool_call>{"name": <function-name>, "arguments": <args-json-object>}</tool_call>.
user
[SCREENSHOT]
Task: {task_description}
assistant
"""

# --- 4. Generate the action ---
sampling_params = SamplingParams(temperature=0.0, max_tokens=1024)

# The model expects the image to be passed via the `images` parameter
outputs = llm.generate(
    prompts=[prompt],
    sampling_params=sampling_params,
    images=[base64_image]
)

# --- 5. Print the result ---
for output in outputs:
    generated_text = output.outputs[0].text
    print("--- Generated Action ---")
    print(generated_text)

For a complete, interactive agent implementation, please see the code in the opagent_single_model directory of our repository.

Citation

If you use OpAgent-32B or the OAgent framework in your research, please cite our work:

@misc{opagent2026,
  author = {CodeFuse-AI Team},
  title = {OpAgent: Operator Agent for Web Navigation},
  year = {2026},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/codefuse-ai/OpAgent}},
  url = {https://github.com/codefuse-ai/OpAgent}
}
Downloads last month
134
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codefuse-ai/OpAgent-32B

Finetuned
(9)
this model
Quantizations
2 models