Sarvam-1-VL-4B-Instruct - LoRA Adapter

Model Description

Fine-tuned vision-language model for Indic languages based on Qwen3-VL-4B-Instruct. This is the LoRA adapter that needs to be merged with the base model.

Training Details

Base Model: Qwen/Qwen3-VL-4B-Instruct
Training Method: LoRA (Rank 128, Alpha 256)
Training Steps: 2,000
Training Time: ~8.9 hours
Final Loss: 6.25
Effective Batch Size: 16

Datasets

Trained on 4 datasets covering:

Translation (40%): BPCC - 22 Indic languages ↔ English
Instruction Following (20%): Pralekha - 11 language pairs
Document Layout (30%): IndicDLP - Document understanding
Visual QA (10%): DocVQA - Question answering

Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English

Usage

from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    "Qwen/Qwen3-VL-4B-Instruct",
    load_in_4bit=True,
)

# Load LoRA adapter
model.load_adapter("mashriram/Sarvam-1-VL-4B-Instruct")

# Use for inference

License

Apache 2.0

Citation

If you use this model, please cite the original Qwen3-VL paper and the datasets used.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mashriram/Sarvam-1-VL-4B-Instruct-Adapter

Base model

Qwen/Qwen3-VL-4B-Instruct

Adapter

(55)

this model

mashriram
/

Sarvam-1-VL-4B-Instruct-Adapter