Update README.md
Browse files
README.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
- allenai/Molmo2-VideoPoint
|
|
|
|
|
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
base_model:
|
|
@@ -29,7 +31,7 @@ You can find all models in the Molmo2 family [here](https://huggingface.co/colle
|
|
| 29 |
**Learn more** about the Molmo2 family [in our announcement blog post](https://allenai.org/blog/molmo2).
|
| 30 |
|
| 31 |
Molmo2-VideoPoint-4B is based on [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) and uses [SigLIP 2](https://huggingface.co/google/siglip-so400m-patch14-384) as vision backbone.
|
| 32 |
-
|
| 33 |
|
| 34 |
Ai2 is commited to open science. The Molmo2 datasets are available [here](https://huggingface.co/collections/allenai/molmo2-data).
|
| 35 |
All other artifacts used in creating Molmo2 (training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
|
|
@@ -148,32 +150,24 @@ print(points)
|
|
| 148 |
|
| 149 |
## Evaluations
|
| 150 |
|
| 151 |
-
We report the
|
| 152 |
-
For details on the evals, refer to
|
| 153 |
-
|
| 154 |
-
| Model |
|
| 155 |
-
|
| 156 |
-
| GPT-5 |
|
| 157 |
-
| GPT-5 mini |
|
| 158 |
-
| Gemini 3 Pro |
|
| 159 |
-
| Gemini 2.5 Pro |
|
| 160 |
-
| Gemini 2.5 Flash |
|
| 161 |
-
| Claude Sonnet 4.5 |
|
| 162 |
-
|
|
| 163 |
-
|
|
| 164 |
-
|
|
| 165 |
-
|
|
| 166 |
-
|
|
| 167 |
-
|
|
| 168 |
-
|
| 169 |
-
| Eagle2.5-8B | 60.7 |
|
| 170 |
-
| PLM-3B | 53.9 |
|
| 171 |
-
| PLM-8B | 56.2 |
|
| 172 |
-
| LLaVA-Video-7B | 52.7 |
|
| 173 |
-
| VideoChat-Flash-7B | 56.1 |
|
| 174 |
-
| **Molmo2-4B (this model)** | 62.8 |
|
| 175 |
-
| Molmo2-8B | 63.1 |
|
| 176 |
-
| Molmo2-7B | 59.7 |
|
| 177 |
|
| 178 |
## License and Use
|
| 179 |
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
- allenai/Molmo2-VideoPoint
|
| 5 |
+
- allenai/pixmo-points
|
| 6 |
+
- allenai/pixmo-cap
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
base_model:
|
|
|
|
| 31 |
**Learn more** about the Molmo2 family [in our announcement blog post](https://allenai.org/blog/molmo2).
|
| 32 |
|
| 33 |
Molmo2-VideoPoint-4B is based on [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) and uses [SigLIP 2](https://huggingface.co/google/siglip-so400m-patch14-384) as vision backbone.
|
| 34 |
+
**Different from the general checkpoints, Molmo2-VideoPoint-4B is finetuned on the Molmo2-VideoPoint data only, after pre-training on pixmo-cap, pixmo-points and tulu's data. It is meant to be used for video pointing and counting only**.
|
| 35 |
|
| 36 |
Ai2 is commited to open science. The Molmo2 datasets are available [here](https://huggingface.co/collections/allenai/molmo2-data).
|
| 37 |
All other artifacts used in creating Molmo2 (training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
|
|
|
|
| 150 |
|
| 151 |
## Evaluations
|
| 152 |
|
| 153 |
+
We report the accuracy and close accuracy on Molmo2-VideoCountEval here.
|
| 154 |
+
For details on the evals, refer to our [technical report](https://allenai.org/papers/molmo2).
|
| 155 |
+
|
| 156 |
+
| Model | Accuracy | Close Acc. |
|
| 157 |
+
|-----------------------------|-----------------------------------------|-----------------------------------------|
|
| 158 |
+
| GPT-5 | 35.8 | 50.3 |
|
| 159 |
+
| GPT-5 mini | 29.8 | 49.3 |
|
| 160 |
+
| Gemini 3 Pro | **37.1** | 53.1 |
|
| 161 |
+
| Gemini 2.5 Pro | 35.8 | **56.5** |
|
| 162 |
+
| Gemini 2.5 Flash | 31.9 | 48.2 |
|
| 163 |
+
| Claude Sonnet 4.5 | 27.2 | 45.1 |
|
| 164 |
+
| Qwen3-VL-4B | 25.3 | 44.3 |
|
| 165 |
+
| Qwen3-VL-8B | 29.6 | 47.7 |
|
| 166 |
+
| Molmo2-4B | 34.3 | <u>56.1</u> |
|
| 167 |
+
| Molmo2-8B | 35.5 | 53.3 |
|
| 168 |
+
| Molmo2-7B | 33.2 | 50.5 |
|
| 169 |
+
| **Molmo2-VideoPoint-4B (this model)** | <u>36.8</u> | **56.5** |
|
| 170 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
|
| 172 |
## License and Use
|
| 173 |
|