Logo for the MolmoWeb Project

MolmoWeb-8B

MolmoWeb is a family of fully open multimodal web agents. MolmoWeb agents achieve state-of-the-art results outperforming similar scale open-weight-only models such as Fara-7B, UI-Tars-1.5-7B, and Holo1-7B. MolmoWeb-8B also surpasses set-of-marks (SoM) agents built on much larger closed frontier models like GPT-4o. We further demonstrate consistent gains through test-time scaling via parallel rollouts with best-of-N selection, achieving 94.7% and 60.5% pass@4 (compared to 78.2% and 35.3% pass@1)on WebVoyager and Online-Mind2Web respectively.

Learn more about the MolmoWeb family in our announcement blog post and tech report.

MolmoWeb-8B is based on Molmo2 architecture, which uses Qwen3-8B and SigLIP 2 as vision backbone.

Ai2 is committed to open science. The MolmoWeb datasets are available here. All other artifacts used in creating MolmoWeb (training code, evaluations, intermediate checkpoints) will be made available, furthering our commitment to open-source AI development and reproducibility.

Quick links:

Quick Start

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch
from jinja2 import Template

checkpoint_dir = "allenai/MolmoWeb-8B"

model = AutoModelForImageTextToText.from_pretrained(
    checkpoint_dir,
    trust_remote_code=True,
    dtype="auto",
    device_map="auto",
)

processor = AutoProcessor.from_pretrained(
    checkpoint_dir,
    trust_remote_code=True,
    padding_side="left",
)


MOLMOWEB_THINK_TEMPLATE = Template(
"""
# GOAL
{{ task_description }}

# PREVIOUS STEPS
{% for action in past_actions: -%}
## Step {{ action['index'] }}
THOUGHT: {{ action['thought'] }}
ACTION: {{ action['action'] }}
{% endfor %}
# CURRENTLY ACTIVE PAGE
Page {{ page_index }}: {{ page_title }} | {{ page_url }}

# NEXT STEP

"""
)

task_description = "Tell me about the Ai2 PIROR team's recent projects"
past_actions = []
user_message = MOLMOWEB_THINK_TEMPLATE.render(
    page_title=None,
    page_url="about:blank",
    page_index=0,
    task_description=task_description,
    past_actions=[]
)
system_message = "molmo_web_think"
prompt = f"{system_message}: {user_message}"

blank_image = Image.new("RGB", (1280, 720), color="white")

image_messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": prompt},
            {"type": "image", "image": blank_image},
        ]
    }
]

inputs = processor.apply_chat_template(
    image_messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    padding=True,
)

inputs = {k: v.to("cuda") for k, v in inputs.items()}

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    output = model.generate(**inputs, max_new_tokens=200)

generated_tokens = output[0, inputs["input_ids"].size(1):]
print(processor.decode(generated_tokens, skip_special_tokens=True))

License and Use

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2’s Responsible Use Guidelines.

Downloads last month
36
Safetensors
Model size
9B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for allenai/MolmoWeb-8B

Finetuned
Qwen/Qwen3-8B
Finetuned
(1225)
this model
Finetunes
1 model

Datasets used to train allenai/MolmoWeb-8B

Collection including allenai/MolmoWeb-8B

Paper for allenai/MolmoWeb-8B