English

[WIP] Upload folder using huggingface_hub (multi-commit 9e0d04e7038bc84c0b8aa8995e8fd774e9219bc08ec637f02a0a35ea9d52528b)

#1
by Ran0618 - opened
.gitattributes CHANGED
@@ -33,6 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- asset/eval_result.png filter=lfs diff=lfs merge=lfs -text
37
- asset/logo.png filter=lfs diff=lfs merge=lfs -text
38
- asset/overview.png filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
README.md CHANGED
@@ -1,174 +1,3 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- ---
6
- <p align="center">
7
- <img src="./asset/logo.png" width="80%"/>
8
- </p>
9
-
10
- # 🔥 Updates
11
-
12
- * \[3/2024\] **VMBench** evaluation code & prompt set released!
13
-
14
-
15
- # 📣 Overview
16
-
17
- <p align="center">
18
- <img src="./asset/overview.png" width="100%"/>
19
- </p>
20
-
21
-
22
- Video generation has advanced rapidly, improving evaluation methods, yet assessing video's motion remains a major challenge. Specifically, there are two key issues: 1) current motion metrics do not fully align with human perceptions; 2) the existing motion prompts are limited. Based on these findings, we introduce **VMBench**---a comprehensive **V**ideo **M**otion **Bench**mark that has perception-aligned motion metrics and features the most diverse types of motion. VMBench has several appealing properties: (1) **Perception-Driven Motion Evaluation Metrics**, we identify five dimensions based on human perception in motion video assessment and develop fine-grained evaluation metrics, providing deeper insights into models' strengths and weaknesses in motion quality. (2) **Meta-Guided Motion Prompt Generation**, a structured method that extracts meta-information, generates diverse motion prompts with LLMs, and refines them through human-AI validation, resulting in a multi-level prompt library covering six key dynamic scene dimensions. (3) **Human-Aligned Validation Mechanism**, we provide human preference annotations to validate our benchmarks, with our metrics achieving an average 35.3% improvement in Spearman’s correlation over baseline methods. This is the first time that the quality of motion in videos has been evaluated from the perspective of human perception alignment.
23
-
24
- # 📊Evaluation Results
25
-
26
-
27
- ## Quantitative Results
28
-
29
- <p align="center">
30
- <img src="./asset/eval_result.png" width="80%"/>
31
- </p>
32
-
33
- ### VMBench Leaderboard
34
-
35
- <div align="center">
36
-
37
- | Models | Avg | CAS | MSS | OIS | PAS | TCS |
38
- | -------------------- | -------- | -------- | -------- | -------- | -------- | -------- |
39
- | OpenSora-v1.2 | 51.6 | 31.2 | 61.9 | 73.0 | 3.4 | 88.5 |
40
- | Mochi 1 | 53.2 | 37.7 | 62.0 | 68.6 | 14.4 | 83.6 |
41
- | OpenSora-Plan-v1.3.0 | 58.9 | 39.3 | 76.0 | **78.6** | 6.0 | 94.7 |
42
- | CogVideoX-5B | 60.6 | 50.6 | 61.6 | 75.4 | 24.6 | 91.0 |
43
- | HunyuanVideo | 63.4 | 51.9 | 81.6 | 65.8 | **26.1** | 96.3 |
44
- | Wan2.1 | **78.4** | **62.8** | **84.2** | 66.0 | 17.9 | **97.8** |
45
-
46
- </div>
47
-
48
- # 🔨 Installation
49
-
50
- ## Create Environment
51
-
52
- ```shell
53
- git clone https://github.com/Ran0618/VMBench.git
54
- cd VMBench
55
-
56
- # create conda environment
57
- conda create -n VMBench python=3.10
58
- pip install torch torchvision
59
-
60
- # Install Grounded-Segment-Anything module
61
- cd Grounded-Segment-Anything
62
- python -m pip install -e segment_anything
63
- pip install --no-build-isolation -e GroundingDINO
64
- pip install -r requirements.txt
65
-
66
- # Install Groudned-SAM-2 module
67
- cd Grounded-SAM-2
68
- pip install -e .
69
-
70
- # Install MMPose toolkit
71
- pip install -U openmim
72
- mim install mmengine
73
- mim install "mmcv==2.1.0"
74
-
75
- # Install Q-Align module
76
- cd Q-Align
77
- pip install -e .
78
-
79
- # Install VideoMAEv2 module
80
- cd VideoMAEv2
81
- pip install -r requirements.txt
82
- ```
83
-
84
- ## Download checkpoints
85
- Place the pre-trained checkpoint files in the `.cache` directory.
86
- You can download our model's checkpoints are from our [HuggingFace repository 🤗](https://huggingface.co/GD-ML/VMBench).
87
-
88
- ```shell
89
- mkdir .cache
90
- cd .cache
91
-
92
- huggingface-cli download GD-ML/VMBench --local-dir .cache/
93
- ```
94
- Please organize the pretrained models in this structure:
95
- ```shell
96
- VMBench/.cache
97
- ├── groundingdino_swinb_cogcoor.pth
98
- ├── sam2.1_hiera_large.pt
99
- ├── sam_vit_h_4b8939.pth
100
- ├── scaled_offline.pth
101
- └── vit_g_vmbench.pt
102
- ```
103
-
104
- # 🔧Usage
105
-
106
- ## Videos Preparation
107
-
108
- Generate videos of your model using the 1050 prompts provided in `prompts/prompts.txt` or `prompts/prompts.json` and organize them in the following structure:
109
-
110
- ```shell
111
- VMBench/eval_results/videos
112
- ├── 0001.mp4
113
- ├── 0002.mp4
114
- ...
115
- └── 1050.mp4
116
- ```
117
-
118
- **Note:** Ensure that you maintain the correspondence between prompts and video sequence numbers. The index for each prompt can be found in the `prompts/prompts.json` file.
119
-
120
- You can follow us `sample_video_demo.py` to generate videos. Or you can put the results video named index into your own folder.
121
-
122
-
123
- ## Evaluation on the VMBench
124
-
125
- ### Running the Evaluation Pipeline
126
- To evaluate generated videos using the VMBench, run the following command:
127
-
128
- ```shell
129
- bash evaluate.sh your_videos_folder
130
- ```
131
-
132
- The evaluation results for each video will be saved in the `./eval_results/${current_time}/results.json`. Scores for each dimension will be saved as `./eval_results/${current_time}/scores.csv`.
133
-
134
- ### Evaluation Efficiency
135
-
136
- We conducted a test using the following configuration:
137
-
138
- - **Model**: CogVideoX-5B
139
- - **Number of Videos**: 1,050
140
- - **Frames per Video**: 49
141
- - **Frame Rate**: 8 FPS
142
-
143
- Here are the time measurements for each evaluation metric:
144
-
145
- | Metric | Time Taken |
146
- |--------|------------|
147
- | PAS (Perceptible Amplitude Score) | 45 minutes |
148
- | OIS (Object Integrity Score) | 30 minutes |
149
- | TCS (Temporal Coherence Score) | 2 hours |
150
- | MSS (Motion Smoothness Score) | 2.5 hours |
151
- | CAS (Commonsense Adherence Score) | 1 hour |
152
-
153
- **Total Evaluation Time**: 6 hours and 45 minutes
154
-
155
- # ❤️Acknowledgement
156
- We would like to express our gratitude to the following open-source repositories that our work is based on: [GroundedSAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [GroundedSAM2](https://github.com/IDEA-Research/Grounded-SAM-2), [Co-Tracker](https://github.com/facebookresearch/co-tracker), [MMPose](https://github.com/open-mmlab/mmpose), [Q-Align](https://github.com/Q-Future/Q-Align), [VideoMAEv2](https://github.com/OpenGVLab/VideoMAEv2), [VideoAlign](https://github.com/KwaiVGI/VideoAlign).
157
- Their contributions have been invaluable to this project.
158
-
159
- # 📜License
160
- The VMBench is licensed under [Apache-2.0 license](http://www.apache.org/licenses/LICENSE-2.0). You are free to use our codes for research purpose.
161
-
162
- # ✏️Citation
163
- If you find our repo useful for your research, please consider citing our paper:
164
- ```bibtex
165
- @misc{ling2025vmbenchbenchmarkperceptionalignedvideo,
166
- title={VMBench: A Benchmark for Perception-Aligned Video Motion Generation},
167
- author={Xinran Ling and Chen Zhu and Meiqi Wu and Hangyu Li and Xiaokun Feng and Cundian Yang and Aiming Hao and Jiashu Zhu and Jiahong Wu and Xiangxiang Chu},
168
- year={2025},
169
- eprint={2503.10076},
170
- archivePrefix={arXiv},
171
- primaryClass={cs.CV},
172
- url={https://arxiv.org/abs/2503.10076},
173
- }
174
- ```
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
asset/eval_result.png DELETED

Git LFS Details

  • SHA256: 89f3536f4dd1aaa8007dde384350ff4af160a80200c1dc01b4f837396f90c7f7
  • Pointer size: 131 Bytes
  • Size of remote file: 632 kB
asset/logo.png DELETED

Git LFS Details

  • SHA256: d26b31e0a99cd0930a85c6516a7a3dc74e84a552555b74af32aa6aa0e9a8facf
  • Pointer size: 132 Bytes
  • Size of remote file: 1.49 MB
asset/overview.png DELETED

Git LFS Details

  • SHA256: 8cc7a346681bf83506df27c8b63eb78c3452121d3950e8731759b7ad175b501e
  • Pointer size: 132 Bytes
  • Size of remote file: 4.75 MB
groundingdino_swinb_cogcoor.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:46270f7a822e6906b655b729c90613e48929d0f2bb8b9b76fd10a856f3ac6ab7
3
- size 938057991
 
 
 
 
sam2.1_hiera_large.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2647878d5dfa5098f2f8649825738a9345572bae2d4350a2468587ece47dd318
3
- size 898083611
 
 
 
 
sam_vit_h_4b8939.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:a7bf3b02f3ebf1267aba913ff637d9a2d5c33d3173bb679e46d9f338c26f262e
3
- size 2564550879
 
 
 
 
scaled_offline.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2670d4562ed69326dda775a26e54883925cd11b6fc9b24cb7aa9f8078bce7834
3
- size 101890938
 
 
 
 
vit_g_vmbench.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e806faa3b9f46457f902b331319eeb19b75bb9352a430827c381de2615f264e7
3
- size 14162848906