Create README.md
Browse files<img src="molmo_2_logo_RGB.png" alt="Logo for the Molmo2 Project" style="width: auto; height: 50px;">
# Molmo2 8B
Molmo2 is a family of open vision-language models developed by the Allen Institute for AI (Ai2). Molmo2 models are trained on Molmo2 data, a dataset of highly-curated video-text pairs. It has state-of-the-art performance among multimodal models with a similar size while being fully open-source. You can find all models in the Molmo2 family [here](https://huggingface.co/collections/allenai/molmo2).
**Learn more** about the Molmo2 family [in our announcement blog post](http://allenai.org/news/molmo2)
Molmo2 8B is based on Qwen3-8B and uses SIGLIP2 as vision backbone. It outperforms others in the class of open weight and data models on short videos, counting, and captioning, and is competitive on long-videos. On video-grounding Molmo2 outperforms larger proprietary models, including 32.9% (Molmo2) vs 17% (Gemini 2.5 Pro) on video pointing.
Try it here! -
Ai2 is commited to open science. All artifacts used in creating Molmo (Molmo2 dataset, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
Quick links:
- 💬 [Demo](https://molmo.allenai.org/](https://playground.allenai.org/?model=molmo2-8b)
- 📂 [All Models](https://huggingface.co/collections/allenai/molmo2)
- 📃 [Paper](UPDATE THIS LINK)
- 🎥 [Blog with Videos](http://allenai.org/news/molmo2)
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- allenai/Molmo2-SynMultiImageQA
|
| 5 |
+
- allenai/Molmo2-VideoPoint
|
| 6 |
+
- allenai/Molmo2-MultiImageQA
|
| 7 |
+
- allenai/Molmo2-AskModelAnything
|
| 8 |
+
- allenai/Molmo2-Cap
|
| 9 |
+
- allenai/Molmo2-VideoCapQA
|
| 10 |
+
- allenai/Molmo2-VideoSubtitleQA
|
| 11 |
+
- allenai/Molmo2-MultiImagePoint
|
| 12 |
+
language:
|
| 13 |
+
- en
|
| 14 |
+
base_model:
|
| 15 |
+
- Qwen/Qwen3-8B
|
| 16 |
+
- google/siglip-so400m-patch14-384
|
| 17 |
+
pipeline_tag: video-text-to-text
|
| 18 |
+
library_name: transformers
|
| 19 |
+
tags:
|
| 20 |
+
- multimodal
|
| 21 |
+
- olmo
|
| 22 |
+
- molmo
|
| 23 |
+
- molmo2
|
| 24 |
+
---
|