Upload LoRA Adapter

72d8d61 verified 3 months ago

1.78 kB

language:
  - en
  - hi
  - bn
  - ta
  - te
  - gu
  - kn
  - ml
  - mr
  - or
  - pa
  - ur
  - as
  - brx
  - doi
  - gom
  - kas
  - mai
  - mni
  - ne
  - sa
  - sat
  - sd
license: apache-2.0
base_model: Qwen/Qwen3-VL-4B-Instruct
tags:
  - vision
  - multilingual
  - indic-languages
  - lora
  - translation
  - document-understanding
  - fine-tuned
datasets:
  - ai4bharat/BPCC
  - ai4bharat/Pralekha
  - ai4bharat/indicdlp
  - lmms-lab/DocVQA
pipeline_tag: image-text-to-text

Sarvam-1-VL-4B-Instruct - LoRA Adapter

Model Description

Fine-tuned vision-language model for Indic languages based on Qwen3-VL-4B-Instruct. This is the LoRA adapter that needs to be merged with the base model.

Training Details

Base Model: Qwen/Qwen3-VL-4B-Instruct
Training Method: LoRA (Rank 128, Alpha 256)
Training Steps: 2,000
Training Time: ~8.9 hours
Final Loss: 6.25
Effective Batch Size: 16

Datasets

Trained on 4 datasets covering:

Translation (40%): BPCC - 22 Indic languages ↔ English
Instruction Following (20%): Pralekha - 11 language pairs
Document Layout (30%): IndicDLP - Document understanding
Visual QA (10%): DocVQA - Question answering

Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English

Usage

from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    "Qwen/Qwen3-VL-4B-Instruct",
    load_in_4bit=True,
)

# Load LoRA adapter
model.load_adapter("mashriram/Sarvam-1-VL-4B-Instruct")

# Use for inference

License

Apache 2.0

Citation

If you use this model, please cite the original Qwen3-VL paper and the datasets used.