HuggingFaceTB
/

SmolVLM2-500M-Video-Instruct

Image-Text-to-Text

Model card Files Files and versions

mfarre commited on Feb 12, 2025

Commit

c0282b1

·

verified ·

1 Parent(s): d0232c9

Update README.md

Files changed (1) hide show

README.md +18 -2

README.md CHANGED Viewed

@@ -80,7 +80,23 @@ SmolVLM2 is built upon [SigLIP](https://huggingface.co/google/siglip-base-patch1
 We release the SmolVLM 2checkpoints under the Apache 2.0 license.
-## Training Details
-### Training Data

 We release the SmolVLM 2checkpoints under the Apache 2.0 license.
+## Training Data
+SmolVLM2 used 3.3M samples for training coming from ten datasets: LlaVa Onevision, M4-Instruct, Mammoth, LlaVa Video 178K, FineVideo, VideoStar, VRipt, Vista-400K, MovieChat and ShareGPT4Video.
+### General split
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_split.png" width="800" height="auto" alt="Image description">
+### Text mixture
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_text.png" width="800" height="auto" alt="Image description">
+### Image mixture
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_image.png" width="800" height="auto" alt="Image description">
+### Multi-image mixture
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_multiimage.png" width="800" height="auto" alt="Image description">
+### Video mixture
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_video.png" width="800" height="auto" alt="Image description">