BonanDing
/

UniMVU

@@ -25,30 +25,23 @@ Open-source UniMVU release checkpoints for instruction-aware multimodal video un
 Unlike plain LoRA releases, UniMVU checkpoints also include `non_lora_trainables.bin` for the extra modality-gating modules. Use the UniMVU loader instead of a PEFT-only `PeftModel.from_pretrained(...)` workflow.
-[Paper PDF](./UniMVU_CVPR_2026__Camera_Ready_.pdf)
-## Highlights
-- Instruction-aware gating across video, audio, depth, and long-video evidence.
-- Single-task adapters for AVQA, AVSD, Music-AVQA, ScanQA, and SQA3D.
-- Unified multi-task adapters for the mixed-training UniMVU release.
-- Gains of up to +13.5 CIDEr on AVSD over the reproduced PAVE baseline, as reported in the paper.
 ## Release Contents
-| Folder | Scale | Type | Task(s) | Base model | Published size |
-| --- | --- | --- | --- | --- | --- |
-| `unimvu_0.5B_avqa` | 0.5B | Single-task | AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
-| `unimvu_0.5B_avsd` | 0.5B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
-| `unimvu_0.5B_music_avqa` | 0.5B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
-| `unimvu_0.5B_scanqa` | 0.5B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
-| `unimvu_0.5B_sqa3d` | 0.5B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
-| `unimvu_7B_avsd` | 7B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-7b-ov` | 715.9 MB |
-| `unimvu_7B_music_avqa` | 7B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-7b-ov` | 715.9 MB |
-| `unimvu_7B_scanqa` | 7B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-7b-ov` | 1.04 GB |
-| `unimvu_7B_sqa3d` | 7B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-7b-ov` | 1.04 GB |
-| `unimvu_uni_0.5B` | 0.5B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 103.7 MB |
-| `unimvu_uni_7B` | 7B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-7b-ov` | 745.3 MB |
 The default upload manifest publishes only the final release files:
@@ -61,44 +54,25 @@ Intermediate `checkpoint-*` folders inside `unimvu_uni_0.5B` are training snapsh
 ## Requirements
-Use these adapters with the open-source UniMVU codebase and its dependencies:
 ```bash
 pip install -r requirements.txt
 pip install huggingface_hub peft
 ```
-If you only need one adapter, prefer `snapshot_download(...)` so you do not fetch the entire release repo.
-## Quick Start
-The example below downloads one subfolder from this repo and loads it through UniMVU's own evaluation loader, which merges the LoRA adapter and then restores `non_lora_trainables.bin`.
-```python
-import os
-from huggingface_hub import snapshot_download
-from unified_eval import load_trained_model_for_eval
-REPO_ID = "BonanDing/UniMVU"
-SUBFOLDER = "unimvu_uni_7B"
-local_root = snapshot_download(
-    repo_id=REPO_ID,
-    allow_patterns=[f"{SUBFOLDER}/*"],
-)
-model_path = os.path.join(local_root, SUBFOLDER)
-tokenizer, model, image_processor, context_len = load_trained_model_for_eval(
-    model_path=model_path,
-    model_base="lmms-lab/llava-onevision-qwen2-7b-ov",
-    model_arg_name="VideoFeatModelArgumentsUniMVU_Uni_7B",
-    model_type="unimvu_uni",
-    device="cuda",
-)
-model.eval()
-```
 ## Loader Mapping
@@ -111,8 +85,8 @@ model.eval()
 ## Evaluation Entry Points
-- Use `unified_eval.py` for AVQA, AVSD, Music-AVQA, ScanQA, and SQA3D.
-- Use `lmms_eval_start.py` for MVBench-style evaluation in the UniMVU codebase.
 ## License
@@ -138,4 +112,4 @@ If you use UniMVU in your work, please cite:
 ## Acknowledgements
-UniMVU builds on the open-source multimodal ecosystem around LLaVA-style training utilities, LMMS-Eval, PEFT, and Transformers.

 Unlike plain LoRA releases, UniMVU checkpoints also include `non_lora_trainables.bin` for the extra modality-gating modules. Use the UniMVU loader instead of a PEFT-only `PeftModel.from_pretrained(...)` workflow.
+[Paper](#)
 ## Release Contents
+| Folder | Scale | Type | Task(s) | Base model |
+| --- | --- | --- | --- | --- |
+| `unimvu_0.5B_avqa` | 0.5B | Single-task | AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
+| `unimvu_0.5B_avsd` | 0.5B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
+| `unimvu_0.5B_music_avqa` | 0.5B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
+| `unimvu_0.5B_scanqa` | 0.5B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
+| `unimvu_0.5B_sqa3d` | 0.5B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
+| `unimvu_7B_avsd` | 7B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-7b-ov` |
+| `unimvu_7B_music_avqa` | 7B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-7b-ov` |
+| `unimvu_7B_scanqa` | 7B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-7b-ov` |
+| `unimvu_7B_sqa3d` | 7B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-7b-ov` |
+| `unimvu_uni_0.5B` | 0.5B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
+| `unimvu_uni_7B` | 7B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-7b-ov` |
 The default upload manifest publishes only the final release files:
 ## Requirements
+Use these checkpoints with the open-source [UniMVU GitHub repository](#) and install the dependencies from that repo:
 ```bash
+git clone <UniMVU GitHub repo>
+cd UniMVU
 pip install -r requirements.txt
 pip install huggingface_hub peft
 ```
+Download the checkpoint folder you need from this repository, then point the UniMVU evaluation scripts to it with `--model-path`.
+## Usage
+These checkpoints are intended to be used together with the [UniMVU GitHub repository](#).
+1. Clone the UniMVU repository and install its dependencies.
+2. Download the checkpoint subfolder you want from this Hugging Face repo.
+3. Set the downloaded folder as `--model-path` in the UniMVU evaluation scripts.
+4. Run the appropriate UniMVU evaluation entry point for your task.
 ## Loader Mapping
 ## Evaluation Entry Points
+- Use `scripts/*_eval_*.sh` and `unified_eval.py` in the UniMVU repository for AVQA, AVSD, Music-AVQA, ScanQA, and SQA3D.
+- Use `lmms_eval_start.py` in the UniMVU repository for MVBench-style evaluation.
 ## License
 ## Acknowledgements
+UniMVU builds on the open-source ecosystem around PAVE, Qwen2, LLaVA-OneVision, LMMS-Eval, PEFT, and Transformers.