Update README.md
Browse files
README.md
CHANGED
|
@@ -78,16 +78,15 @@ SmolVLM is not intended for high-stakes scenarios or critical decision-making pr
|
|
| 78 |
|
| 79 |
SmolVLM2 is built upon [SigLIP](https://huggingface.co/google/siglip-base-patch16-512) as image encoder and [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) for text decoder part.
|
| 80 |
|
| 81 |
-
We release the
|
| 82 |
|
| 83 |
## Training Data
|
| 84 |
|
| 85 |
-
SmolVLM2 used 3.3M samples for training
|
| 86 |
-
|
| 87 |
-
### General split
|
| 88 |
-
|
| 89 |
-
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_split.png" width="auto" height="auto" alt="Image description">
|
| 90 |
|
|
|
|
|
|
|
| 91 |
### Text mixture
|
| 92 |
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_text.png" width="auto" height="auto" alt="Image description">
|
| 93 |
|
|
@@ -98,5 +97,4 @@ SmolVLM2 used 3.3M samples for training coming from ten datasets: LlaVa Onevisio
|
|
| 98 |
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_multiimage.png" width="auto" height="auto" alt="Image description">
|
| 99 |
|
| 100 |
### Video mixture
|
| 101 |
-
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_video.png" width="auto" height="auto" alt="Image description">
|
| 102 |
-
|
|
|
|
| 78 |
|
| 79 |
SmolVLM2 is built upon [SigLIP](https://huggingface.co/google/siglip-base-patch16-512) as image encoder and [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) for text decoder part.
|
| 80 |
|
| 81 |
+
We release the SmolVLM2 checkpoints under the Apache 2.0 license.
|
| 82 |
|
| 83 |
## Training Data
|
| 84 |
|
| 85 |
+
SmolVLM2 used 3.3M samples for training originally from ten different datasets: : LlaVa Onevision, M4-Instruct, Mammoth, LlaVa Video 178K, FineVideo, VideoStar, VRipt, Vista-400K, MovieChat and ShareGPT4Video.
|
| 86 |
+
In the following plots we give a general overview of the samples across modalities and the source of those samples.
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
+
<center><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_split.png" width="auto" height="auto" alt="Image description">
|
| 89 |
+
</center>
|
| 90 |
### Text mixture
|
| 91 |
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_text.png" width="auto" height="auto" alt="Image description">
|
| 92 |
|
|
|
|
| 97 |
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_multiimage.png" width="auto" height="auto" alt="Image description">
|
| 98 |
|
| 99 |
### Video mixture
|
| 100 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_video.png" width="auto" height="auto" alt="Image description">
|
|
|