Update README.md
Browse files
README.md
CHANGED
|
@@ -80,7 +80,23 @@ SmolVLM2 is built upon [SigLIP](https://huggingface.co/google/siglip-base-patch1
|
|
| 80 |
|
| 81 |
We release the SmolVLM 2checkpoints under the Apache 2.0 license.
|
| 82 |
|
| 83 |
-
## Training
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
|
|
|
|
| 80 |
|
| 81 |
We release the SmolVLM 2checkpoints under the Apache 2.0 license.
|
| 82 |
|
| 83 |
+
## Training Data
|
| 84 |
|
| 85 |
+
SmolVLM2 used 3.3M samples for training coming from ten datasets: LlaVa Onevision, M4-Instruct, Mammoth, LlaVa Video 178K, FineVideo, VideoStar, VRipt, Vista-400K, MovieChat and ShareGPT4Video.
|
| 86 |
+
|
| 87 |
+
### General split
|
| 88 |
+
|
| 89 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_split.png" width="800" height="auto" alt="Image description">
|
| 90 |
+
|
| 91 |
+
### Text mixture
|
| 92 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_text.png" width="800" height="auto" alt="Image description">
|
| 93 |
+
|
| 94 |
+
### Image mixture
|
| 95 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_image.png" width="800" height="auto" alt="Image description">
|
| 96 |
+
|
| 97 |
+
### Multi-image mixture
|
| 98 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_multiimage.png" width="800" height="auto" alt="Image description">
|
| 99 |
+
|
| 100 |
+
### Video mixture
|
| 101 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_video.png" width="800" height="auto" alt="Image description">
|
| 102 |
|