Sarvam-1-VL-4B-Instruct - LoRA Adapter
Model Description
Fine-tuned vision-language model for Indic languages based on Qwen3-VL-4B-Instruct. This is the LoRA adapter that needs to be merged with the base model.
Training Details
- Base Model: Qwen/Qwen3-VL-4B-Instruct
- Training Method: LoRA (Rank 128, Alpha 256)
- Training Steps: 2,000
- Training Time: ~8.9 hours
- Final Loss: 6.25
- Effective Batch Size: 16
Datasets
Trained on 4 datasets covering:
- Translation (40%): BPCC - 22 Indic languages ↔ English
- Instruction Following (20%): Pralekha - 11 language pairs
- Document Layout (30%): IndicDLP - Document understanding
- Visual QA (10%): DocVQA - Question answering
Supported Languages
Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English
Usage
from unsloth import FastVisionModel
model, tokenizer = FastVisionModel.from_pretrained(
"Qwen/Qwen3-VL-4B-Instruct",
load_in_4bit=True,
)
# Load LoRA adapter
model.load_adapter("mashriram/Sarvam-1-VL-4B-Instruct")
# Use for inference
License
Apache 2.0
Citation
If you use this model, please cite the original Qwen3-VL paper and the datasets used.
Model tree for mashriram/Sarvam-1-VL-4B-Instruct-Adapter
Base model
Qwen/Qwen3-VL-4B-Instruct