--- license: other library_name: peft base_model: - lmms-lab/llava-onevision-qwen2-0.5b-ov - lmms-lab/llava-onevision-qwen2-7b-ov pipeline_tag: image-text-to-text tags: - multimodal - video - audio - 3d - peft - lora - safetensors - llava-onevision - qwen2 language: - en --- # UniMVU - LoRA Adapters for LLaVA-OneVision Qwen2 Open-source UniMVU release checkpoints for instruction-aware multimodal video understanding. This release covers audio-video QA, 3D QA, and unified multi-task adapters built on top of `lmms-lab/llava-onevision-qwen2-0.5b-ov` and `lmms-lab/llava-onevision-qwen2-7b-ov`. Unlike plain LoRA releases, UniMVU checkpoints also include `non_lora_trainables.bin` for the extra modality-gating modules. Use the UniMVU loader instead of a PEFT-only `PeftModel.from_pretrained(...)` workflow. [arXiv](#) ## Release Contents | Folder | Scale | Type | Task(s) | Base model | | --- | --- | --- | --- | --- | | `unimvu_0.5B_avqa` | 0.5B | Single-task | AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | | `unimvu_0.5B_avsd` | 0.5B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | | `unimvu_0.5B_music_avqa` | 0.5B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | | `unimvu_0.5B_scanqa` | 0.5B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | | `unimvu_0.5B_sqa3d` | 0.5B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | | `unimvu_7B_avqa` | 7B | Single-task | AVQA | `lmms-lab/llava-onevision-qwen2-7b-ov` | | `unimvu_7B_avsd` | 7B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-7b-ov` | | `unimvu_7B_music_avqa` | 7B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-7b-ov` | | `unimvu_7B_scanqa` | 7B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-7b-ov` | | `unimvu_7B_sqa3d` | 7B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-7b-ov` | | `unimvu_uni_0.5B` | 0.5B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | | `unimvu_uni_7B` | 7B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-7b-ov` | The default upload manifest publishes only the final release files: - `adapter_config.json` - `adapter_model.safetensors` - `config.json` - `non_lora_trainables.bin` Intermediate `checkpoint-*` folders inside `unimvu_uni_0.5B` are training snapshots and are excluded from the default Hugging Face upload. ## Requirements Use these checkpoints with the open-source [UniMVU GitHub repository](#) and install the dependencies from that repo: ```bash git clone cd UniMVU pip install -r requirements.txt pip install huggingface_hub peft ``` Download the checkpoint folder you need from this repository, then point the UniMVU evaluation scripts to it with `--model-path`. ## Usage These checkpoints are intended to be used together with the [UniMVU GitHub repository](#). 1. Clone the UniMVU repository and install its dependencies. 2. Download the checkpoint subfolder you want from this Hugging Face repo. 3. Set the downloaded folder as `--model-path` in the UniMVU evaluation scripts. 4. Run the appropriate UniMVU evaluation entry point for your task. ## Loader Mapping | Release family | `model_type` | `model_arg_name` | `model_base` | | --- | --- | --- | --- | | Single-task 0.5B adapters | `unimvu` | `VideoFeatModelArgumentsUniMVU` | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | | Single-task 7B adapters | `unimvu` | `VideoFeatModelArgumentsUniMVU_7B` | `lmms-lab/llava-onevision-qwen2-7b-ov` | | Unified 0.5B adapter | `unimvu_uni` | `VideoFeatModelArgumentsUniMVU_Uni` | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | | Unified 7B adapter | `unimvu_uni` | `VideoFeatModelArgumentsUniMVU_Uni_7B` | `lmms-lab/llava-onevision-qwen2-7b-ov` | ## Evaluation Entry Points - Use `scripts/*_eval_*.sh` and `unified_eval.py` in the UniMVU repository for AVQA, AVSD, Music-AVQA, ScanQA, and SQA3D. - Use `lmms_eval_start.py` in the UniMVU repository for MVBench-style evaluation. ## License The released adapters depend on third-party base models and should be used in compliance with the licenses of: - `lmms-lab/llava-onevision-qwen2-0.5b-ov` - `lmms-lab/llava-onevision-qwen2-7b-ov` Please also follow the usage terms of the downstream datasets and features used in evaluation. ## Citation If you use UniMVU in your work, please cite: ```bibtex @inproceedings{ding2026unimvu, title={Not All Modalities Are Equal: Instruction-Aware Gating for Multimodal Videos}, author={Ding, Bonan and Nawaz, Umair and Khan, Ufaq and Shaker, Abdelrahman M. and Khan, Muhammad Haris and Cao, Jiale and Xie, Jin and Khan, Fahad Shahbaz}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2026} } ``` ## Acknowledgements UniMVU builds on the open-source ecosystem around PAVE, Qwen2, LLaVA-OneVision, LMMS-Eval, PEFT, and Transformers.