mfarre commited on
Commit
f4d0547
·
verified ·
1 Parent(s): 7b54b5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -2
README.md CHANGED
@@ -208,9 +208,63 @@ You can cite us in the following way:
208
  ## Training Data
209
  SmolVLM2 used 3.3M samples for training originally from ten different datasets: [LlaVa Onevision](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [M4-Instruct](https://huggingface.co/datasets/lmms-lab/M4-Instruct-Data), [Mammoth](https://huggingface.co/datasets/MAmmoTH-VL/MAmmoTH-VL-Instruct-12M), [LlaVa Video 178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K), [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo), [VideoStar](https://huggingface.co/datasets/orrzohar/Video-STaR), [VRipt](https://huggingface.co/datasets/Mutonix/Vript), [Vista-400K](https://huggingface.co/datasets/TIGER-Lab/VISTA-400K), [MovieChat](https://huggingface.co/datasets/Enxin/MovieChat-1K_train) and [ShareGPT4Video](https://huggingface.co/datasets/ShareGPT4Video/ShareGPT4Video).
210
  In the following plots we give a general overview of the samples across modalities and the source of those samples.
211
-
212
  <center><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_split.png" width="auto" height="auto" alt="Image description">
213
  </center>
214
 
215
  ### Details
216
- <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_datadetails.png" width="auto" height="auto" alt="Image description">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
208
  ## Training Data
209
  SmolVLM2 used 3.3M samples for training originally from ten different datasets: [LlaVa Onevision](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [M4-Instruct](https://huggingface.co/datasets/lmms-lab/M4-Instruct-Data), [Mammoth](https://huggingface.co/datasets/MAmmoTH-VL/MAmmoTH-VL-Instruct-12M), [LlaVa Video 178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K), [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo), [VideoStar](https://huggingface.co/datasets/orrzohar/Video-STaR), [VRipt](https://huggingface.co/datasets/Mutonix/Vript), [Vista-400K](https://huggingface.co/datasets/TIGER-Lab/VISTA-400K), [MovieChat](https://huggingface.co/datasets/Enxin/MovieChat-1K_train) and [ShareGPT4Video](https://huggingface.co/datasets/ShareGPT4Video/ShareGPT4Video).
210
  In the following plots we give a general overview of the samples across modalities and the source of those samples.
211
+ <!--
212
  <center><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_split.png" width="auto" height="auto" alt="Image description">
213
  </center>
214
 
215
  ### Details
216
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_datadetails.png" width="auto" height="auto" alt="Image description"> -->
217
+
218
+ ## Data Split per modality
219
+
220
+ | Data Type | Percentage |
221
+ |--------------|------------|
222
+ | Image | 34.4% |
223
+ | Text | 20.2% |
224
+ | Video | 33.0% |
225
+ | Multi-image | 12.3% |
226
+
227
+
228
+ ## Granular dataset slices per modality
229
+
230
+ ### Text Datasets
231
+ | Dataset | Percentage |
232
+ |--------------------------------------------|------------|
233
+ | llava-onevision/magpie_pro_ft3_80b_mt | 6.8% |
234
+ | llava-onevision/magpie_pro_ft3_80b_tt | 6.8% |
235
+ | llava-onevision/magpie_pro_qwen2_72b_tt | 5.8% |
236
+ | llava-onevision/mathqa | 0.9% |
237
+
238
+ ### Multi-image Datasets
239
+ | Dataset | Percentage |
240
+ |--------------------------------------------|------------|
241
+ | m4-instruct-data/m4_instruct_multiimage | 10.4% |
242
+ | mammoth/multiimage-cap6 | 1.9% |
243
+
244
+ ### Image Datasets
245
+ | Dataset | Percentage |
246
+ |--------------------------------------------|------------|
247
+ | llava-onevision/other | 17.4% |
248
+ | llava-onevision/vision_flan | 3.9% |
249
+ | llava-onevision/mavis_math_metagen | 2.6% |
250
+ | llava-onevision/mavis_math_rule_geo | 2.5% |
251
+ | llava-onevision/sharegpt4o | 1.7% |
252
+ | llava-onevision/sharegpt4v_coco | 1.5% |
253
+ | llava-onevision/image_textualization | 1.3% |
254
+ | llava-onevision/sharegpt4v_llava | 0.9% |
255
+ | llava-onevision/mapqa | 0.9% |
256
+ | llava-onevision/qa | 0.8% |
257
+ | llava-onevision/textocr | 0.8% |
258
+
259
+ ### Video Datasets
260
+ | Dataset | Percentage |
261
+ |--------------------------------------------|------------|
262
+ | llava-video-178k/1-2m | 7.3% |
263
+ | llava-video-178k/2-3m | 7.0% |
264
+ | other-video/combined | 5.7% |
265
+ | llava-video-178k/hound | 4.4% |
266
+ | llava-video-178k/0-30s | 2.4% |
267
+ | video-star/starb | 2.2% |
268
+ | vista-400k/combined | 2.2% |
269
+ | vript/long | 1.0% |
270
+ | ShareGPT4Video/all | 0.8% |