IndoColSmol
Collection
Lightweight Vision-Language Models for Multimodal Indonesian Document Search
•
2 items
•
Updated
This model is a fine-tuned version of vidore/ColSmolVLM-Instruct-500M-base on the ingenio/indodvqa_dataset dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0.0099 | 1 | 0.4474 |
| 0.4523 | 0.3960 | 40 | 0.4055 |
| 0.3996 | 0.7921 | 80 | 0.3804 |
| 0.3637 | 1.1881 | 120 | 0.3687 |
| 0.345 | 1.5842 | 160 | 0.3627 |
| 0.3466 | 1.9802 | 200 | 0.3630 |
Base model
HuggingFaceTB/SmolLM2-360M