Sarvam-1-VL-4B-Instruct - LoRA Adapter

Model Description

Fine-tuned vision-language model for Indic languages based on Qwen3-VL-4B-Instruct. This is the LoRA adapter that needs to be merged with the base model.

Training Details

  • Base Model: Qwen/Qwen3-VL-4B-Instruct
  • Training Method: LoRA (Rank 128, Alpha 256)
  • Training Steps: 2,000
  • Training Time: ~8.9 hours
  • Final Loss: 6.25
  • Effective Batch Size: 16

Datasets

Trained on 4 datasets covering:

  • Translation (40%): BPCC - 22 Indic languages ↔ English
  • Instruction Following (20%): Pralekha - 11 language pairs
  • Document Layout (30%): IndicDLP - Document understanding
  • Visual QA (10%): DocVQA - Question answering

Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English

Usage

from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    "Qwen/Qwen3-VL-4B-Instruct",
    load_in_4bit=True,
)

# Load LoRA adapter
model.load_adapter("mashriram/Sarvam-1-VL-4B-Instruct")

# Use for inference

License

Apache 2.0

Citation

If you use this model, please cite the original Qwen3-VL paper and the datasets used.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mashriram/Sarvam-1-VL-4B-Instruct-Adapter

Adapter
(14)
this model

Datasets used to train mashriram/Sarvam-1-VL-4B-Instruct-Adapter