File size: 5,701 Bytes
42c8776 6c746ce 236051b 6c746ce 42c8776 a99747f 42c8776 a99747f 42c8776 a99747f 42c8776 4d7e589 42c8776 60409d1 42c8776 2dfd311 42c8776 a6bbf87 42c8776 eb6137a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | ---
license: apache-2.0
datasets:
- allenai/MolmoWeb-SyntheticTraj
- allenai/MolmoWeb-HumanTrajs
- allenai/MolmoWeb-HumanSkills
- allenai/MolmoWeb-SyntheticSkills
- allenai/MolmoWeb-SyntheticQA
- allenai/MolmoWeb-SyntheticGround
language:
- en
base_model:
- Qwen/Qwen3-8B
- google/siglip-so400m-patch14-384
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- multimodal
- olmo
- molmo
- molmo2
---
<img src="molmoweb_logo.png" alt="Logo for the MolmoWeb Project" style="width: auto; height: 50px;">
# MolmoWeb-8B
<span style="color:red; font-weight: bold;">Important Update!</span>
We made a few small but important updates to this HF/transformers-compatible checkpoint to ensure exact outputs to our native model checkpoint on **March 29, 2026 ~6PM PST**.
If you downloaded this model checkpoint earlier than this time, we recommend re-downloading it. See PRs [2](https://huggingface.co/allenai/MolmoWeb-8B/discussions/2) and [3](https://huggingface.co/allenai/MolmoWeb-8B/discussions/3) for more details. Thanks for your understanding!
MolmoWeb is a family of fully open multimodal web agents. MolmoWeb agents achieve state-of-the-art results outperforming similar scale open-weight-only
models such as Fara-7B, UI-Tars-1.5-7B, and Holo1-7B. MolmoWeb-8B also surpasses set-of-marks
(SoM) agents built on much larger closed frontier models like GPT-4o. We further demonstrate
consistent gains through test-time scaling via parallel rollouts with best-of-N selection, achieving 94.7%
and 60.5% pass@4 (compared to 78.2% and 35.3% pass@1)on WebVoyager and Online-Mind2Web
respectively.
**Learn more** about the MolmoWeb family in our announcement [blog post](https://allenai.org/blog/molmoweb) and [tech report](https://allenai.org/papers/molmoweb).
MolmoWeb-8B is based on [Molmo2](https://arxiv.org/abs/2601.10611) architecture, which uses [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) and [SigLIP 2](https://huggingface.co/google/siglip-so400m-patch14-384) as vision backbone.
Ai2 is committed to open science. The MolmoWeb datasets are available [here](https://huggingface.co/collections/allenai/molmoweb-data).
All other artifacts used in creating MolmoWeb (training code, [evaluations](https://github.com/allenai/molmoweb), intermediate checkpoints) will be made available, furthering our commitment to open-source AI development and reproducibility.
Quick links:
- ๐ฌ [Demo](https://molmoweb.allen.ai/)
- ๐ [All Models](https://huggingface.co/collections/allenai/molmoweb)
- ๐ [All Data](https://huggingface.co/collections/allenai/molmoweb-data)
- ๐ [Paper](https://allenai.org/papers/molmoweb)
- ๐ฅ [Blog with Videos](https://allenai.org/blog/molmoweb)
## Quick Start
```python
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch
from jinja2 import Template
checkpoint_dir = "allenai/MolmoWeb-8B"
model = AutoModelForImageTextToText.from_pretrained(
checkpoint_dir,
trust_remote_code=True,
torch_dtype=torch.float32, # we recommend using the default float32 precision
attn_implementation="sdpa",
device_map="auto",
)
processor = AutoProcessor.from_pretrained(
checkpoint_dir,
trust_remote_code=True,
padding_side="left",
)
MOLMOWEB_THINK_TEMPLATE = Template(
"""
# GOAL
{{ task_description }}
# PREVIOUS STEPS
{% for action in past_actions: -%}
## Step {{ action['index'] }}
THOUGHT: {{ action['thought'] }}
ACTION: {{ action['action'] }}
{% endfor %}
# CURRENTLY ACTIVE PAGE
Page {{ page_index }}: {{ page_title }} | {{ page_url }}
# NEXT STEP
"""
)
task_description = "Tell me about the Ai2 PIROR team's recent projects"
past_actions = []
user_message = MOLMOWEB_THINK_TEMPLATE.render(
page_title=None,
page_url="about:blank",
page_index=0,
task_description=task_description,
past_actions=[]
)
system_message = "molmo_web_think"
prompt = f"{system_message}: {user_message}"
blank_image = Image.new("RGB", (1280, 720), color="white")
image_messages = [
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image", "image": blank_image},
]
}
]
inputs = processor.apply_chat_template(
image_messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
padding=True,
)
# Remove token_type_ids: HF uses it to enable bidirectional attention for image tokens; molmoweb is trained with causal attention only
inputs = {k: v.to("cuda") for k, v in inputs.items() if k != "token_type_ids"}
with torch.inference_mode():
output = model.generate(**inputs, max_new_tokens=200)
generated_tokens = output[0, inputs["input_ids"].size(1):]
print(processor.decode(generated_tokens, skip_special_tokens=True))
```
## License and Use
This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2โs [Responsible Use Guidelines](https://allenai.org/responsible-use).
## Citation
If you use this dataset, please cite:
[arXiv:2604.08516](https://arxiv.org/abs/2604.08516)
```bibtex
@misc{gupta2026molmowebopenvisualweb,
title={MolmoWeb: Open Visual Web Agent and Open Data for the Open Web},
author={Tanmay Gupta and Piper Wolters and Zixian Ma and Peter Sushko and Rock Yuren Pang and Diego Llanes and Yue Yang and Taira Anderson and Boyuan Zheng and Zhongzheng Ren and Harsh Trivedi and Taylor Blanton and Caleb Ouellette and Winson Han and Ali Farhadi and Ranjay Krishna},
year={2026},
eprint={2604.08516},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.08516},
} |