Sinhala Visual Question Answering - Compact VLM Adaptation
Collection
Dataset and fine-tuned models from the study "Benchmarking and Adapting Compact Multimodal Models for Sinhala Visual Question Answering." • 8 items • Updated
CPT-only (Continued Pre-Training) adapter for Gemma-3-4B-IT on the MADLAD-400 Sinhala corpus.
This adapter does not perform VQA on its own. It is intended to be used as the first stage
of the sequential CPT → VQA pipeline together with Siluni/gemma3-4b-cpt-vqa-33k.
This adapter must be loaded together with the VQA adapter and combined before inference.
See Siluni/gemma3-4b-cpt-vqa-33k for the full loading instructions.
@misc{keerthiratne2025sinhalavqa,
title = {Benchmarking and Adapting Compact Multimodal Models for Sinhala Visual Question Answering},
author = {Keerthiratne, Siluni and Weerasinghe, Ruvan and Sumanathilaka, Deshan},
year = {2025},
institution = {Informatics Institute of Technology / Robert Gordon University},
}