UniMVU - LoRA Adapters for LLaVA-OneVision Qwen2

Open-source UniMVU release checkpoints for instruction-aware multimodal video understanding. This release covers audio-video QA, 3D QA, and unified multi-task adapters built on top of lmms-lab/llava-onevision-qwen2-0.5b-ov and lmms-lab/llava-onevision-qwen2-7b-ov.

Unlike plain LoRA releases, UniMVU checkpoints also include non_lora_trainables.bin for the extra modality-gating modules. Use the UniMVU loader instead of a PEFT-only PeftModel.from_pretrained(...) workflow.

arXiv

Release Contents

Folder Scale Type Task(s) Base model
unimvu_0.5B_avqa 0.5B Single-task AVQA lmms-lab/llava-onevision-qwen2-0.5b-ov
unimvu_0.5B_avsd 0.5B Single-task AVSD lmms-lab/llava-onevision-qwen2-0.5b-ov
unimvu_0.5B_music_avqa 0.5B Single-task Music-AVQA lmms-lab/llava-onevision-qwen2-0.5b-ov
unimvu_0.5B_scanqa 0.5B Single-task ScanQA lmms-lab/llava-onevision-qwen2-0.5b-ov
unimvu_0.5B_sqa3d 0.5B Single-task SQA3D lmms-lab/llava-onevision-qwen2-0.5b-ov
unimvu_7B_avqa 7B Single-task AVQA lmms-lab/llava-onevision-qwen2-7b-ov
unimvu_7B_avsd 7B Single-task AVSD lmms-lab/llava-onevision-qwen2-7b-ov
unimvu_7B_music_avqa 7B Single-task Music-AVQA lmms-lab/llava-onevision-qwen2-7b-ov
unimvu_7B_scanqa 7B Single-task ScanQA lmms-lab/llava-onevision-qwen2-7b-ov
unimvu_7B_sqa3d 7B Single-task SQA3D lmms-lab/llava-onevision-qwen2-7b-ov
unimvu_uni_0.5B 0.5B Unified Mixed multi-task release lmms-lab/llava-onevision-qwen2-0.5b-ov
unimvu_uni_7B 7B Unified Mixed multi-task release lmms-lab/llava-onevision-qwen2-7b-ov

The default upload manifest publishes only the final release files:

  • adapter_config.json
  • adapter_model.safetensors
  • config.json
  • non_lora_trainables.bin

Intermediate checkpoint-* folders inside unimvu_uni_0.5B are training snapshots and are excluded from the default Hugging Face upload.

Requirements

Use these checkpoints with the open-source UniMVU GitHub repository and install the dependencies from that repo:

git clone <UniMVU GitHub repo>
cd UniMVU
pip install -r requirements.txt
pip install huggingface_hub peft

Download the checkpoint folder you need from this repository, then point the UniMVU evaluation scripts to it with --model-path.

Usage

These checkpoints are intended to be used together with the UniMVU GitHub repository.

  1. Clone the UniMVU repository and install its dependencies.
  2. Download the checkpoint subfolder you want from this Hugging Face repo.
  3. Set the downloaded folder as --model-path in the UniMVU evaluation scripts.
  4. Run the appropriate UniMVU evaluation entry point for your task.

Loader Mapping

Release family model_type model_arg_name model_base
Single-task 0.5B adapters unimvu VideoFeatModelArgumentsUniMVU lmms-lab/llava-onevision-qwen2-0.5b-ov
Single-task 7B adapters unimvu VideoFeatModelArgumentsUniMVU_7B lmms-lab/llava-onevision-qwen2-7b-ov
Unified 0.5B adapter unimvu_uni VideoFeatModelArgumentsUniMVU_Uni lmms-lab/llava-onevision-qwen2-0.5b-ov
Unified 7B adapter unimvu_uni VideoFeatModelArgumentsUniMVU_Uni_7B lmms-lab/llava-onevision-qwen2-7b-ov

Evaluation Entry Points

  • Use scripts/*_eval_*.sh and unified_eval.py in the UniMVU repository for AVQA, AVSD, Music-AVQA, ScanQA, and SQA3D.
  • Use lmms_eval_start.py in the UniMVU repository for MVBench-style evaluation.

License

The released adapters depend on third-party base models and should be used in compliance with the licenses of:

  • lmms-lab/llava-onevision-qwen2-0.5b-ov
  • lmms-lab/llava-onevision-qwen2-7b-ov

Please also follow the usage terms of the downstream datasets and features used in evaluation.

Citation

If you use UniMVU in your work, please cite:

@inproceedings{ding2026unimvu,
  title={Not All Modalities Are Equal: Instruction-Aware Gating for Multimodal Videos},
  author={Ding, Bonan and Nawaz, Umair and Khan, Ufaq and Shaker, Abdelrahman M. and Khan, Muhammad Haris and Cao, Jiale and Xie, Jin and Khan, Fahad Shahbaz},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}

Acknowledgements

UniMVU builds on the open-source ecosystem around PAVE, Qwen2, LLaVA-OneVision, LMMS-Eval, PEFT, and Transformers.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BonanDing/UniMVU

Adapter
(7)
this model