mashriram's picture
Upload LoRA Adapter
72d8d61 verified
metadata
language:
  - en
  - hi
  - bn
  - ta
  - te
  - gu
  - kn
  - ml
  - mr
  - or
  - pa
  - ur
  - as
  - brx
  - doi
  - gom
  - kas
  - mai
  - mni
  - ne
  - sa
  - sat
  - sd
license: apache-2.0
base_model: Qwen/Qwen3-VL-4B-Instruct
tags:
  - vision
  - multilingual
  - indic-languages
  - lora
  - translation
  - document-understanding
  - fine-tuned
datasets:
  - ai4bharat/BPCC
  - ai4bharat/Pralekha
  - ai4bharat/indicdlp
  - lmms-lab/DocVQA
pipeline_tag: image-text-to-text

Sarvam-1-VL-4B-Instruct - LoRA Adapter

Model Description

Fine-tuned vision-language model for Indic languages based on Qwen3-VL-4B-Instruct. This is the LoRA adapter that needs to be merged with the base model.

Training Details

  • Base Model: Qwen/Qwen3-VL-4B-Instruct
  • Training Method: LoRA (Rank 128, Alpha 256)
  • Training Steps: 2,000
  • Training Time: ~8.9 hours
  • Final Loss: 6.25
  • Effective Batch Size: 16

Datasets

Trained on 4 datasets covering:

  • Translation (40%): BPCC - 22 Indic languages ↔ English
  • Instruction Following (20%): Pralekha - 11 language pairs
  • Document Layout (30%): IndicDLP - Document understanding
  • Visual QA (10%): DocVQA - Question answering

Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English

Usage

from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    "Qwen/Qwen3-VL-4B-Instruct",
    load_in_4bit=True,
)

# Load LoRA adapter
model.load_adapter("mashriram/Sarvam-1-VL-4B-Instruct")

# Use for inference

License

Apache 2.0

Citation

If you use this model, please cite the original Qwen3-VL paper and the datasets used.