Instructions to use allenai/MolmoWeb-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use allenai/MolmoWeb-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="allenai/MolmoWeb-8B", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("allenai/MolmoWeb-8B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use allenai/MolmoWeb-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "allenai/MolmoWeb-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allenai/MolmoWeb-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/allenai/MolmoWeb-8B

SGLang

How to use allenai/MolmoWeb-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "allenai/MolmoWeb-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allenai/MolmoWeb-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "allenai/MolmoWeb-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allenai/MolmoWeb-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use allenai/MolmoWeb-8B with Docker Model Runner:
```
docker model run hf.co/allenai/MolmoWeb-8B
```

MolmoWeb-8B

Important Update! We made a few small but important updates to this HF/transformers-compatible checkpoint to ensure exact outputs to our native model checkpoint on March 29, 2026 ~6PM PST. If you downloaded this model checkpoint earlier than this time, we recommend re-downloading it. See PRs 2 and 3 for more details. Thanks for your understanding!

MolmoWeb is a family of fully open multimodal web agents. MolmoWeb agents achieve state-of-the-art results outperforming similar scale open-weight-only models such as Fara-7B, UI-Tars-1.5-7B, and Holo1-7B. MolmoWeb-8B also surpasses set-of-marks (SoM) agents built on much larger closed frontier models like GPT-4o. We further demonstrate consistent gains through test-time scaling via parallel rollouts with best-of-N selection, achieving 94.7% and 60.5% pass@4 (compared to 78.2% and 35.3% pass@1)on WebVoyager and Online-Mind2Web respectively.

Learn more about the MolmoWeb family in our announcement blog post and tech report.

MolmoWeb-8B is based on Molmo2 architecture, which uses Qwen3-8B and SigLIP 2 as vision backbone.

Ai2 is committed to open science. The MolmoWeb datasets are available here. All other artifacts used in creating MolmoWeb (training code, evaluations, intermediate checkpoints) will be made available, furthering our commitment to open-source AI development and reproducibility.

Quick links:

💬 Demo
📂 All Models
📚 All Data
📃 Paper
🎥 Blog with Videos

Quick Start

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch
from jinja2 import Template

checkpoint_dir = "allenai/MolmoWeb-8B"

model = AutoModelForImageTextToText.from_pretrained(
    checkpoint_dir,
    trust_remote_code=True,
    torch_dtype=torch.float32, # we recommend using the default float32 precision 
    attn_implementation="sdpa",
    device_map="auto",
)

processor = AutoProcessor.from_pretrained(
    checkpoint_dir,
    trust_remote_code=True,
    padding_side="left",
)


MOLMOWEB_THINK_TEMPLATE = Template(
"""
# GOAL
{{ task_description }}

# PREVIOUS STEPS
{% for action in past_actions: -%}
## Step {{ action['index'] }}
THOUGHT: {{ action['thought'] }}
ACTION: {{ action['action'] }}
{% endfor %}
# CURRENTLY ACTIVE PAGE
Page {{ page_index }}: {{ page_title }} | {{ page_url }}

# NEXT STEP

"""
)

task_description = "Tell me about the Ai2 PIROR team's recent projects"
past_actions = []
user_message = MOLMOWEB_THINK_TEMPLATE.render(
    page_title=None,
    page_url="about:blank",
    page_index=0,
    task_description=task_description,
    past_actions=[]
)
system_message = "molmo_web_think"
prompt = f"{system_message}: {user_message}"

blank_image = Image.new("RGB", (1280, 720), color="white")

image_messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": prompt},
            {"type": "image", "image": blank_image},
        ]
    }
]

inputs = processor.apply_chat_template(
    image_messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    padding=True,
)

# Remove token_type_ids: HF uses it to enable bidirectional attention for image tokens; molmoweb is trained with causal attention only
inputs = {k: v.to("cuda") for k, v in inputs.items() if k != "token_type_ids"} 

with torch.inference_mode():
    output = model.generate(**inputs, max_new_tokens=200)

generated_tokens = output[0, inputs["input_ids"].size(1):]
print(processor.decode(generated_tokens, skip_special_tokens=True))

License and Use

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2’s Responsible Use Guidelines.

Citation

If you use this dataset, please cite:

arXiv:2604.08516

@misc{gupta2026molmowebopenvisualweb,
      title={MolmoWeb: Open Visual Web Agent and Open Data for the Open Web}, 
      author={Tanmay Gupta and Piper Wolters and Zixian Ma and Peter Sushko and Rock Yuren Pang and Diego Llanes and Yue Yang and Taira Anderson and Boyuan Zheng and Zhongzheng Ren and Harsh Trivedi and Taylor Blanton and Caleb Ouellette and Winson Han and Ali Farhadi and Ranjay Krishna},
      year={2026},
      eprint={2604.08516},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.08516}, 
}