allenai
/

Molmo2-8B

 - molmo
 - molmo2
 ---
+<img src="molmo_2_logo_RGB.png" alt="Logo for the Molmo2 Project" style="width: auto; height: 50px;">
+# Molmo2 8B
+Molmo2 is a family of open vision-language models developed by the Allen Institute for AI (Ai2). Molmo2 models are trained on Molmo2 data, a dataset of highly-curated video-text pairs. It has state-of-the-art performance among multimodal models with a similar size while being fully open-source. You can find all models in the Molmo2 family [here](https://huggingface.co/collections/allenai/molmo2).
+**Learn more** about the Molmo2 family [in our announcement blog post](http://allenai.org/news/molmo2)
+Molmo2 8B is based on Qwen3-8B and uses SIGLIP2 as vision backbone. It outperforms others in the class of open weight and data models on short videos, counting, and captioning, and is competitive on long-videos. On video-grounding Molmo2 outperforms larger proprietary models, including 32.9% (Molmo2) vs 17% (Gemini 2.5 Pro) on video pointing.
+Try it here! -
+Ai2 is commited to open science. All artifacts used in creating Molmo (Molmo2 dataset, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
+Quick links:
+- 💬 [Demo](https://molmo.allenai.org/](https://playground.allenai.org/?model=molmo2-8b)
+- 📂 [All Models](https://huggingface.co/collections/allenai/molmo2)
+- 📃 [Paper](UPDATE THIS LINK)
+- 🎥 [Blog with Videos](http://allenai.org/news/molmo2)