zixianma02 commited on
Commit
b868e1c
·
verified ·
1 Parent(s): 45bf545

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -27
README.md CHANGED
@@ -2,6 +2,8 @@
2
  license: apache-2.0
3
  datasets:
4
  - allenai/Molmo2-VideoPoint
 
 
5
  language:
6
  - en
7
  base_model:
@@ -29,7 +31,7 @@ You can find all models in the Molmo2 family [here](https://huggingface.co/colle
29
  **Learn more** about the Molmo2 family [in our announcement blog post](https://allenai.org/blog/molmo2).
30
 
31
  Molmo2-VideoPoint-4B is based on [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) and uses [SigLIP 2](https://huggingface.co/google/siglip-so400m-patch14-384) as vision backbone.
32
- It is mostly trained on the Molmo2-VideoPoint data only and meant to be used for video pointing and counting only.
33
 
34
  Ai2 is commited to open science. The Molmo2 datasets are available [here](https://huggingface.co/collections/allenai/molmo2-data).
35
  All other artifacts used in creating Molmo2 (training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
@@ -148,32 +150,24 @@ print(points)
148
 
149
  ## Evaluations
150
 
151
- We report the Average Score on 15 Academic Benchmarks here.
152
- For details on the evals, refer to the main video results table in our [technical report](https://allenai.org/papers/molmo2).
153
-
154
- | Model | Average Score on 15 Academic Benchmarks |
155
- |-----------------------------|-----------------------------------------|
156
- | GPT-5 | 70.6 |
157
- | GPT-5 mini | 65.0 |
158
- | Gemini 3 Pro | 70.0 |
159
- | Gemini 2.5 Pro | 71.2 |
160
- | Gemini 2.5 Flash | 66.7 |
161
- | Claude Sonnet 4.5 | 59.6 |
162
- | InternVL3.5-4B | 53.4 |
163
- | InternVL3.5-8B | 54.1 |
164
- | Qwen3-VL-4B | 58.1 |
165
- | Qwen3-VL-8B | 59.5 |
166
- | Keye-VL-1.5-8B | 55.7 |
167
- | GLM-4.1V-9B | 56.9 |
168
- | MiniCPM-V-4.5-8B | 56.6 |
169
- | Eagle2.5-8B | 60.7 |
170
- | PLM-3B | 53.9 |
171
- | PLM-8B | 56.2 |
172
- | LLaVA-Video-7B | 52.7 |
173
- | VideoChat-Flash-7B | 56.1 |
174
- | **Molmo2-4B (this model)** | 62.8 |
175
- | Molmo2-8B | 63.1 |
176
- | Molmo2-7B | 59.7 |
177
 
178
  ## License and Use
179
 
 
2
  license: apache-2.0
3
  datasets:
4
  - allenai/Molmo2-VideoPoint
5
+ - allenai/pixmo-points
6
+ - allenai/pixmo-cap
7
  language:
8
  - en
9
  base_model:
 
31
  **Learn more** about the Molmo2 family [in our announcement blog post](https://allenai.org/blog/molmo2).
32
 
33
  Molmo2-VideoPoint-4B is based on [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) and uses [SigLIP 2](https://huggingface.co/google/siglip-so400m-patch14-384) as vision backbone.
34
+ **Different from the general checkpoints, Molmo2-VideoPoint-4B is finetuned on the Molmo2-VideoPoint data only, after pre-training on pixmo-cap, pixmo-points and tulu's data. It is meant to be used for video pointing and counting only**.
35
 
36
  Ai2 is commited to open science. The Molmo2 datasets are available [here](https://huggingface.co/collections/allenai/molmo2-data).
37
  All other artifacts used in creating Molmo2 (training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
 
150
 
151
  ## Evaluations
152
 
153
+ We report the accuracy and close accuracy on Molmo2-VideoCountEval here.
154
+ For details on the evals, refer to our [technical report](https://allenai.org/papers/molmo2).
155
+
156
+ | Model | Accuracy | Close Acc. |
157
+ |-----------------------------|-----------------------------------------|-----------------------------------------|
158
+ | GPT-5 | 35.8 | 50.3 |
159
+ | GPT-5 mini | 29.8 | 49.3 |
160
+ | Gemini 3 Pro | **37.1** | 53.1 |
161
+ | Gemini 2.5 Pro | 35.8 | **56.5** |
162
+ | Gemini 2.5 Flash | 31.9 | 48.2 |
163
+ | Claude Sonnet 4.5 | 27.2 | 45.1 |
164
+ | Qwen3-VL-4B | 25.3 | 44.3 |
165
+ | Qwen3-VL-8B | 29.6 | 47.7 |
166
+ | Molmo2-4B | 34.3 | <u>56.1</u> |
167
+ | Molmo2-8B | 35.5 | 53.3 |
168
+ | Molmo2-7B | 33.2 | 50.5 |
169
+ | **Molmo2-VideoPoint-4B (this model)** | <u>36.8</u> | **56.5** |
170
+
 
 
 
 
 
 
 
 
171
 
172
  ## License and Use
173