[WIP] Upload folder using huggingface_hub (multi-commit 9e0d04e7038bc84c0b8aa8995e8fd774e9219bc08ec637f02a0a35ea9d52528b)

by Ran0618 - opened Mar 12, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-201

This PR is in draft mode

Files changed (10) hide show

.gitattributes +0 -3
README.md +3 -174
asset/eval_result.png +0 -3
asset/logo.png +0 -3
asset/overview.png +0 -3
groundingdino_swinb_cogcoor.pth +0 -3
sam2.1_hiera_large.pt +0 -3
sam_vit_h_4b8939.pth +0 -3
scaled_offline.pth +0 -3
vit_g_vmbench.pt +0 -3

.gitattributes CHANGED Viewed

@@ -33,6 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
-asset/eval_result.png filter=lfs diff=lfs merge=lfs -text
-asset/logo.png filter=lfs diff=lfs merge=lfs -text
-asset/overview.png filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,174 +1,3 @@
----
-license: apache-2.0
-language:
-- en
----
-<p align="center">
-  <img src="./asset/logo.png" width="80%"/>
-</p>
-# 🔥 Updates
-*   \[3/2024\] **VMBench** evaluation code & prompt set released!
-# 📣 Overview
-<p align="center">
-  <img src="./asset/overview.png" width="100%"/>
-</p>
-Video generation has advanced rapidly, improving evaluation methods, yet assessing video's motion remains a major challenge. Specifically, there are two key issues: 1) current motion metrics do not fully align with human perceptions; 2) the existing motion prompts are limited. Based on these findings, we introduce **VMBench**---a comprehensive **V**ideo **M**otion **Bench**mark that has perception-aligned motion metrics and features the most diverse types of motion. VMBench has several appealing properties: (1) **Perception-Driven Motion Evaluation Metrics**, we identify five dimensions based on human perception in motion video assessment and develop fine-grained evaluation metrics, providing deeper insights into models' strengths and weaknesses in motion quality. (2) **Meta-Guided Motion Prompt Generation**, a structured method that extracts meta-information, generates diverse motion prompts with LLMs, and refines them through human-AI validation, resulting in a multi-level prompt library covering six key dynamic scene dimensions. (3) **Human-Aligned Validation Mechanism**, we provide human preference annotations to validate our benchmarks, with our metrics achieving an average 35.3% improvement in Spearman’s correlation over baseline methods. This is the first time that the quality of motion in videos has been evaluated from the perspective of human perception alignment.
-# 📊Evaluation Results
-## Quantitative Results
-<p align="center">
-  <img src="./asset/eval_result.png" width="80%"/>
-</p>
-### VMBench Leaderboard
-<div align="center">
-| Models               | Avg      | CAS      | MSS      | OIS      | PAS      | TCS      |
-| -------------------- | -------- | -------- | -------- | -------- | -------- | -------- |
-| OpenSora-v1.2        | 51.6     | 31.2     | 61.9     | 73.0     | 3.4      | 88.5     |
-| Mochi 1              | 53.2     | 37.7     | 62.0     | 68.6     | 14.4     | 83.6     |
-| OpenSora-Plan-v1.3.0 | 58.9     | 39.3     | 76.0     | **78.6** | 6.0      | 94.7     |
-| CogVideoX-5B         | 60.6     | 50.6     | 61.6     | 75.4     | 24.6     | 91.0     |
-| HunyuanVideo         | 63.4     | 51.9     | 81.6     | 65.8     | **26.1** | 96.3     |
-| Wan2.1               | **78.4** | **62.8** | **84.2** | 66.0     | 17.9     | **97.8** |
-</div>
-# 🔨 Installation
-## Create Environment
-```shell
-git clone https://github.com/Ran0618/VMBench.git
-cd VMBench
-# create conda environment
-conda create -n VMBench python=3.10
-pip install torch torchvision
-# Install Grounded-Segment-Anything module
-cd Grounded-Segment-Anything
-python -m pip install -e segment_anything
-pip install --no-build-isolation -e GroundingDINO
-pip install -r requirements.txt
-# Install Groudned-SAM-2 module
-cd Grounded-SAM-2
-pip install -e .
-# Install MMPose toolkit
-pip install -U openmim
-mim install mmengine
-mim install "mmcv==2.1.0"
-# Install Q-Align module
-cd Q-Align
-pip install -e .
-# Install VideoMAEv2 module
-cd VideoMAEv2
-pip install -r requirements.txt
-```
-## Download checkpoints
-Place the pre-trained checkpoint files in the `.cache` directory.
-You can download our model's checkpoints are from our [HuggingFace repository 🤗](https://huggingface.co/GD-ML/VMBench).
-```shell
-mkdir .cache
-cd .cache
-huggingface-cli download GD-ML/VMBench --local-dir .cache/
-```
-Please organize the pretrained models in this structure:
-```shell
-VMBench/.cache
-├── groundingdino_swinb_cogcoor.pth
-├── sam2.1_hiera_large.pt
-├── sam_vit_h_4b8939.pth
-├── scaled_offline.pth
-└── vit_g_vmbench.pt
-```
-# 🔧Usage
-## Videos Preparation
-Generate videos of your model using the 1050 prompts provided in `prompts/prompts.txt` or `prompts/prompts.json` and organize them in the following structure:
-```shell
-VMBench/eval_results/videos
-├── 0001.mp4
-├── 0002.mp4
-...
-└── 1050.mp4
-```
-**Note:** Ensure that you maintain the correspondence between prompts and video sequence numbers. The index for each prompt can be found in the `prompts/prompts.json` file.
-You can follow us `sample_video_demo.py` to generate videos. Or you can put the results video named index into your own folder.
-## Evaluation on the VMBench
-### Running the Evaluation Pipeline
-To evaluate generated videos using the VMBench, run the following command:
-```shell
-bash evaluate.sh your_videos_folder
-```
-The evaluation results for each video will be saved in the `./eval_results/${current_time}/results.json`. Scores for each dimension will be saved as `./eval_results/${current_time}/scores.csv`.
-### Evaluation Efficiency
-We conducted a test using the following configuration:
-- **Model**: CogVideoX-5B
-- **Number of Videos**: 1,050
-- **Frames per Video**: 49
-- **Frame Rate**: 8 FPS
-Here are the time measurements for each evaluation metric:
-| Metric | Time Taken |
-|--------|------------|
-| PAS (Perceptible Amplitude Score) | 45 minutes |
-| OIS (Object Integrity Score) | 30 minutes |
-| TCS (Temporal Coherence Score) | 2 hours |
-| MSS (Motion Smoothness Score) | 2.5 hours |
-| CAS (Commonsense Adherence Score) | 1 hour |
-**Total Evaluation Time**: 6 hours and 45 minutes
-# ❤️Acknowledgement
-We would like to express our gratitude to the following open-source repositories that our work is based on: [GroundedSAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [GroundedSAM2](https://github.com/IDEA-Research/Grounded-SAM-2), [Co-Tracker](https://github.com/facebookresearch/co-tracker), [MMPose](https://github.com/open-mmlab/mmpose), [Q-Align](https://github.com/Q-Future/Q-Align), [VideoMAEv2](https://github.com/OpenGVLab/VideoMAEv2), [VideoAlign](https://github.com/KwaiVGI/VideoAlign).
-Their contributions have been invaluable to this project.
-# 📜License
-The VMBench is licensed under [Apache-2.0 license](http://www.apache.org/licenses/LICENSE-2.0). You are free to use our codes for research purpose.
-# ✏️Citation
-If you find our repo useful for your research, please consider citing our paper:
-  ```bibtex
-@misc{ling2025vmbenchbenchmarkperceptionalignedvideo,
-      title={VMBench: A Benchmark for Perception-Aligned Video Motion Generation},
-      author={Xinran Ling and Chen Zhu and Meiqi Wu and Hangyu Li and Xiaokun Feng and Cundian Yang and Aiming Hao and Jiashu Zhu and Jiahong Wu and Xiangxiang Chu},
-      year={2025},
-      eprint={2503.10076},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV},
-      url={https://arxiv.org/abs/2503.10076},
-}
-   ```

+---
+license: apache-2.0
+---

asset/eval_result.png DELETED Viewed

Git LFS Details

SHA256: 89f3536f4dd1aaa8007dde384350ff4af160a80200c1dc01b4f837396f90c7f7
Pointer size: 131 Bytes
Size of remote file: 632 kB

asset/logo.png DELETED Viewed

Git LFS Details

SHA256: d26b31e0a99cd0930a85c6516a7a3dc74e84a552555b74af32aa6aa0e9a8facf
Pointer size: 132 Bytes
Size of remote file: 1.49 MB

asset/overview.png DELETED Viewed

Git LFS Details

SHA256: 8cc7a346681bf83506df27c8b63eb78c3452121d3950e8731759b7ad175b501e
Pointer size: 132 Bytes
Size of remote file: 4.75 MB

groundingdino_swinb_cogcoor.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:46270f7a822e6906b655b729c90613e48929d0f2bb8b9b76fd10a856f3ac6ab7
-size 938057991

sam2.1_hiera_large.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:2647878d5dfa5098f2f8649825738a9345572bae2d4350a2468587ece47dd318
-size 898083611

sam_vit_h_4b8939.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:a7bf3b02f3ebf1267aba913ff637d9a2d5c33d3173bb679e46d9f338c26f262e
-size 2564550879

scaled_offline.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:2670d4562ed69326dda775a26e54883925cd11b6fc9b24cb7aa9f8078bce7834
-size 101890938

vit_g_vmbench.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e806faa3b9f46457f902b331319eeb19b75bb9352a430827c381de2615f264e7
-size 14162848906