roTripathi commited on
Commit
c0c2b8f
·
verified ·
1 Parent(s): db8d279

Create README.md

Browse files

<img src="molmo_2_logo_RGB.png" alt="Logo for the Molmo2 Project" style="width: auto; height: 50px;">

# Molmo2 8B

Molmo2 is a family of open vision-language models developed by the Allen Institute for AI (Ai2). Molmo2 models are trained on Molmo2 data, a dataset of highly-curated video-text pairs. It has state-of-the-art performance among multimodal models with a similar size while being fully open-source. You can find all models in the Molmo2 family [here](https://huggingface.co/collections/allenai/molmo2).

**Learn more** about the Molmo2 family [in our announcement blog post](http://allenai.org/news/molmo2)

Molmo2 8B is based on Qwen3-8B and uses SIGLIP2 as vision backbone. It outperforms others in the class of open weight and data models on short videos, counting, and captioning, and is competitive on long-videos. On video-grounding Molmo2 outperforms larger proprietary models, including 32.9% (Molmo2) vs 17% (Gemini 2.5 Pro) on video pointing.

Try it here! -

Ai2 is commited to open science. All artifacts used in creating Molmo (Molmo2 dataset, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.

Quick links:
- 💬 [Demo](https://molmo.allenai.org/](https://playground.allenai.org/?model=molmo2-8b)
- 📂 [All Models](https://huggingface.co/collections/allenai/molmo2)
- 📃 [Paper](UPDATE THIS LINK)
- 🎥 [Blog with Videos](http://allenai.org/news/molmo2)

Files changed (1) hide show
  1. README.md +24 -0
README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - allenai/Molmo2-SynMultiImageQA
5
+ - allenai/Molmo2-VideoPoint
6
+ - allenai/Molmo2-MultiImageQA
7
+ - allenai/Molmo2-AskModelAnything
8
+ - allenai/Molmo2-Cap
9
+ - allenai/Molmo2-VideoCapQA
10
+ - allenai/Molmo2-VideoSubtitleQA
11
+ - allenai/Molmo2-MultiImagePoint
12
+ language:
13
+ - en
14
+ base_model:
15
+ - Qwen/Qwen3-8B
16
+ - google/siglip-so400m-patch14-384
17
+ pipeline_tag: video-text-to-text
18
+ library_name: transformers
19
+ tags:
20
+ - multimodal
21
+ - olmo
22
+ - molmo
23
+ - molmo2
24
+ ---