BonanDing commited on
Commit
c18f32d
·
verified ·
1 Parent(s): fc03f38

Update UniMVU model card

Browse files
Files changed (1) hide show
  1. README.md +27 -53
README.md CHANGED
@@ -25,30 +25,23 @@ Open-source UniMVU release checkpoints for instruction-aware multimodal video un
25
 
26
  Unlike plain LoRA releases, UniMVU checkpoints also include `non_lora_trainables.bin` for the extra modality-gating modules. Use the UniMVU loader instead of a PEFT-only `PeftModel.from_pretrained(...)` workflow.
27
 
28
- [Paper PDF](./UniMVU_CVPR_2026__Camera_Ready_.pdf)
29
-
30
- ## Highlights
31
-
32
- - Instruction-aware gating across video, audio, depth, and long-video evidence.
33
- - Single-task adapters for AVQA, AVSD, Music-AVQA, ScanQA, and SQA3D.
34
- - Unified multi-task adapters for the mixed-training UniMVU release.
35
- - Gains of up to +13.5 CIDEr on AVSD over the reproduced PAVE baseline, as reported in the paper.
36
 
37
  ## Release Contents
38
 
39
- | Folder | Scale | Type | Task(s) | Base model | Published size |
40
- | --- | --- | --- | --- | --- | --- |
41
- | `unimvu_0.5B_avqa` | 0.5B | Single-task | AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
42
- | `unimvu_0.5B_avsd` | 0.5B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
43
- | `unimvu_0.5B_music_avqa` | 0.5B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
44
- | `unimvu_0.5B_scanqa` | 0.5B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
45
- | `unimvu_0.5B_sqa3d` | 0.5B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 96.4 MB |
46
- | `unimvu_7B_avsd` | 7B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-7b-ov` | 715.9 MB |
47
- | `unimvu_7B_music_avqa` | 7B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-7b-ov` | 715.9 MB |
48
- | `unimvu_7B_scanqa` | 7B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-7b-ov` | 1.04 GB |
49
- | `unimvu_7B_sqa3d` | 7B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-7b-ov` | 1.04 GB |
50
- | `unimvu_uni_0.5B` | 0.5B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-0.5b-ov` | 103.7 MB |
51
- | `unimvu_uni_7B` | 7B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-7b-ov` | 745.3 MB |
52
 
53
  The default upload manifest publishes only the final release files:
54
 
@@ -61,44 +54,25 @@ Intermediate `checkpoint-*` folders inside `unimvu_uni_0.5B` are training snapsh
61
 
62
  ## Requirements
63
 
64
- Use these adapters with the open-source UniMVU codebase and its dependencies:
65
 
66
  ```bash
 
 
67
  pip install -r requirements.txt
68
  pip install huggingface_hub peft
69
  ```
70
 
71
- If you only need one adapter, prefer `snapshot_download(...)` so you do not fetch the entire release repo.
72
-
73
- ## Quick Start
74
-
75
- The example below downloads one subfolder from this repo and loads it through UniMVU's own evaluation loader, which merges the LoRA adapter and then restores `non_lora_trainables.bin`.
76
 
77
- ```python
78
- import os
79
 
80
- from huggingface_hub import snapshot_download
81
 
82
- from unified_eval import load_trained_model_for_eval
83
-
84
- REPO_ID = "BonanDing/UniMVU"
85
- SUBFOLDER = "unimvu_uni_7B"
86
-
87
- local_root = snapshot_download(
88
- repo_id=REPO_ID,
89
- allow_patterns=[f"{SUBFOLDER}/*"],
90
- )
91
- model_path = os.path.join(local_root, SUBFOLDER)
92
-
93
- tokenizer, model, image_processor, context_len = load_trained_model_for_eval(
94
- model_path=model_path,
95
- model_base="lmms-lab/llava-onevision-qwen2-7b-ov",
96
- model_arg_name="VideoFeatModelArgumentsUniMVU_Uni_7B",
97
- model_type="unimvu_uni",
98
- device="cuda",
99
- )
100
- model.eval()
101
- ```
102
 
103
  ## Loader Mapping
104
 
@@ -111,8 +85,8 @@ model.eval()
111
 
112
  ## Evaluation Entry Points
113
 
114
- - Use `unified_eval.py` for AVQA, AVSD, Music-AVQA, ScanQA, and SQA3D.
115
- - Use `lmms_eval_start.py` for MVBench-style evaluation in the UniMVU codebase.
116
 
117
  ## License
118
 
@@ -138,4 +112,4 @@ If you use UniMVU in your work, please cite:
138
 
139
  ## Acknowledgements
140
 
141
- UniMVU builds on the open-source multimodal ecosystem around LLaVA-style training utilities, LMMS-Eval, PEFT, and Transformers.
 
25
 
26
  Unlike plain LoRA releases, UniMVU checkpoints also include `non_lora_trainables.bin` for the extra modality-gating modules. Use the UniMVU loader instead of a PEFT-only `PeftModel.from_pretrained(...)` workflow.
27
 
28
+ [Paper](#)
 
 
 
 
 
 
 
29
 
30
  ## Release Contents
31
 
32
+ | Folder | Scale | Type | Task(s) | Base model |
33
+ | --- | --- | --- | --- | --- |
34
+ | `unimvu_0.5B_avqa` | 0.5B | Single-task | AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
35
+ | `unimvu_0.5B_avsd` | 0.5B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
36
+ | `unimvu_0.5B_music_avqa` | 0.5B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
37
+ | `unimvu_0.5B_scanqa` | 0.5B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
38
+ | `unimvu_0.5B_sqa3d` | 0.5B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
39
+ | `unimvu_7B_avsd` | 7B | Single-task | AVSD | `lmms-lab/llava-onevision-qwen2-7b-ov` |
40
+ | `unimvu_7B_music_avqa` | 7B | Single-task | Music-AVQA | `lmms-lab/llava-onevision-qwen2-7b-ov` |
41
+ | `unimvu_7B_scanqa` | 7B | Single-task | ScanQA | `lmms-lab/llava-onevision-qwen2-7b-ov` |
42
+ | `unimvu_7B_sqa3d` | 7B | Single-task | SQA3D | `lmms-lab/llava-onevision-qwen2-7b-ov` |
43
+ | `unimvu_uni_0.5B` | 0.5B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-0.5b-ov` |
44
+ | `unimvu_uni_7B` | 7B | Unified | Mixed multi-task release | `lmms-lab/llava-onevision-qwen2-7b-ov` |
45
 
46
  The default upload manifest publishes only the final release files:
47
 
 
54
 
55
  ## Requirements
56
 
57
+ Use these checkpoints with the open-source [UniMVU GitHub repository](#) and install the dependencies from that repo:
58
 
59
  ```bash
60
+ git clone <UniMVU GitHub repo>
61
+ cd UniMVU
62
  pip install -r requirements.txt
63
  pip install huggingface_hub peft
64
  ```
65
 
66
+ Download the checkpoint folder you need from this repository, then point the UniMVU evaluation scripts to it with `--model-path`.
 
 
 
 
67
 
68
+ ## Usage
 
69
 
70
+ These checkpoints are intended to be used together with the [UniMVU GitHub repository](#).
71
 
72
+ 1. Clone the UniMVU repository and install its dependencies.
73
+ 2. Download the checkpoint subfolder you want from this Hugging Face repo.
74
+ 3. Set the downloaded folder as `--model-path` in the UniMVU evaluation scripts.
75
+ 4. Run the appropriate UniMVU evaluation entry point for your task.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  ## Loader Mapping
78
 
 
85
 
86
  ## Evaluation Entry Points
87
 
88
+ - Use `scripts/*_eval_*.sh` and `unified_eval.py` in the UniMVU repository for AVQA, AVSD, Music-AVQA, ScanQA, and SQA3D.
89
+ - Use `lmms_eval_start.py` in the UniMVU repository for MVBench-style evaluation.
90
 
91
  ## License
92
 
 
112
 
113
  ## Acknowledgements
114
 
115
+ UniMVU builds on the open-source ecosystem around PAVE, Qwen2, LLaVA-OneVision, LMMS-Eval, PEFT, and Transformers.