IndoColSmol
Collection
Lightweight Vision-Language Models for Multimodal Indonesian Document Search
•
2 items
•
Updated
This model is a fine-tuned version of vidore/ColSmolVLM-Instruct-256M-base on the ingenio/indodvqa_dataset dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0.0099 | 1 | 0.4959 |
| 0.4991 | 0.3960 | 40 | 0.4319 |
| 0.4293 | 0.7921 | 80 | 0.3986 |
| 0.4 | 1.1881 | 120 | 0.3829 |
| 0.3653 | 1.5842 | 160 | 0.3788 |
| 0.3846 | 1.9802 | 200 | 0.3764 |
Base model
HuggingFaceTB/SmolLM2-135M