---
license: other
library_name: peft
base_model:
  - lmms-lab/llava-onevision-qwen2-0.5b-ov
  - lmms-lab/llava-onevision-qwen2-7b-ov
pipeline_tag: image-text-to-text
tags:
  - multimodal
  - video
  - audio
  - 3d
  - peft
  - lora
  - safetensors
  - llava-onevision
  - qwen2
language:
  - en
---

# UniMVU - LoRA Adapters for LLaVA-OneVision Qwen2

Open-source UniMVU release checkpoints for instruction-aware multimodal video understanding. This release covers audio-video QA, 3D QA, and unified multi-task adapters built on top of `lmms-lab/llava-onevision-qwen2-0.5b-ov` and `lmms-lab/llava-onevision-qwen2-7b-ov`.

Unlike plain LoRA releases, UniMVU checkpoints also include `non_lora_trainables.bin` for the extra modality-gating modules. Use the UniMVU loader instead of a PEFT-only `PeftModel.from_pretrained(...)` workflow.

[arXiv](#)

## Release Contents

| Folder | Scale | Type | Task(s) | Base model |
| --- | --- | --- | --- | --- |
| `unimvu_0.5B_avqa` | 0.5B | Single-task | AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
| `unimvu_0.5B_avsd` | 0.5B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
| `unimvu_0.5B_music_avqa` | 0.5B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
| `unimvu_0.5B_scanqa` | 0.5B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
| `unimvu_0.5B_sqa3d` | 0.5B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
| `unimvu_7B_avqa` | 7B | Single-task | AVQA | `lmms-lab/llava-onevision-qwen2-7b-ov` |
| `unimvu_7B_avsd` | 7B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-7b-ov` |
| `unimvu_7B_music_avqa` | 7B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-7b-ov` |
| `unimvu_7B_scanqa` | 7B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-7b-ov` |
| `unimvu_7B_sqa3d` | 7B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-7b-ov` |
| `unimvu_uni_0.5B` | 0.5B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
| `unimvu_uni_7B` | 7B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-7b-ov` |

The default upload manifest publishes only the final release files:

- `adapter_config.json`
- `adapter_model.safetensors`
- `config.json`
- `non_lora_trainables.bin`

Intermediate `checkpoint-*` folders inside `unimvu_uni_0.5B` are training snapshots and are excluded from the default Hugging Face upload.

## Requirements

Use these checkpoints with the open-source [UniMVU GitHub repository](#) and install the dependencies from that repo:

```bash
git clone <UniMVU GitHub repo>
cd UniMVU
pip install -r requirements.txt
pip install huggingface_hub peft
```

Download the checkpoint folder you need from this repository, then point the UniMVU evaluation scripts to it with `--model-path`.

## Usage

These checkpoints are intended to be used together with the [UniMVU GitHub repository](#).

1. Clone the UniMVU repository and install its dependencies.
2. Download the checkpoint subfolder you want from this Hugging Face repo.
3. Set the downloaded folder as `--model-path` in the UniMVU evaluation scripts.
4. Run the appropriate UniMVU evaluation entry point for your task.

## Loader Mapping

| Release family | `model_type` | `model_arg_name` | `model_base` |
| --- | --- | --- | --- |
| Single-task 0.5B adapters | `unimvu` | `VideoFeatModelArgumentsUniMVU` | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
| Single-task 7B adapters | `unimvu` | `VideoFeatModelArgumentsUniMVU_7B` | `lmms-lab/llava-onevision-qwen2-7b-ov` |
| Unified 0.5B adapter | `unimvu_uni` | `VideoFeatModelArgumentsUniMVU_Uni` | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
| Unified 7B adapter | `unimvu_uni` | `VideoFeatModelArgumentsUniMVU_Uni_7B` | `lmms-lab/llava-onevision-qwen2-7b-ov` |

## Evaluation Entry Points

- Use `scripts/*_eval_*.sh` and `unified_eval.py` in the UniMVU repository for AVQA, AVSD, Music-AVQA, ScanQA, and SQA3D.
- Use `lmms_eval_start.py` in the UniMVU repository for MVBench-style evaluation.

## License

The released adapters depend on third-party base models and should be used in compliance with the licenses of:

- `lmms-lab/llava-onevision-qwen2-0.5b-ov`
- `lmms-lab/llava-onevision-qwen2-7b-ov`

Please also follow the usage terms of the downstream datasets and features used in evaluation.

## Citation

If you use UniMVU in your work, please cite:

```bibtex
@inproceedings{ding2026unimvu,
  title={Not All Modalities Are Equal: Instruction-Aware Gating for Multimodal Videos},
  author={Ding, Bonan and Nawaz, Umair and Khan, Ufaq and Shaker, Abdelrahman M. and Khan, Muhammad Haris and Cao, Jiale and Xie, Jin and Khan, Fahad Shahbaz},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}
```

## Acknowledgements

UniMVU builds on the open-source ecosystem around PAVE, Qwen2, LLaVA-OneVision, LMMS-Eval, PEFT, and Transformers.