VideoGrain / README.md
XiangpengYang's picture
.
1d7168b
|
raw
history blame
4.4 kB
Multi-grained Video Editing

VideoGrain:
Modulating Space-Time Attention for Multi-Grained Video Editing (ICLR 2025)

[Project Page]

arXiv Hugging Face Spaces Project page

Multi-Grained Video Editing "Class Level: human class β†’ spiderman" "Instance Level: left β†’ Spiderman, right β†’ Polar Bear" "Part Level: Polar Bear + Sunglasses" "left β†’ teddy bear, right β†’ golden retriever"
"left cat→ Samoyed, right cat→ Tiger" "behind→ Iron Man, front→ Stormtrooper" "half-sleeve gray shirt→ a black suit"

▢️ Setup Environment

Our method is tested using cuda12.1, fp16 of accelerator and xformers on a single L40.

# Step 1: Create and activate Conda environment
conda create -n videograin python==3.10 
conda activate videograin

# Step 2: Install PyTorch, CUDA and Xformers
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install --pre -U xformers==0.0.27
# Step 3: Install additional dependencies with pip
pip install -r requirements.txt

xformers is recommended to save memory and running time.

You may download all the base model checkpoints using the following bash command

## download sd 1.5, controlnet depth/pose v10/v11
bash download_all.sh

Prepare ControlNet annotator weights (e.g., DW-Pose, depth_zoe, depth_midas, OpenPose)

mkdir annotator/ckpts

Method 1: Download dwpose models

(Note: if your are avaiable to huggingface, other models like depth_zoe etc can be automatically downloaded)

Download dwpose model dw-ll_ucoco_384.onnx (baidu, google) and Det model yolox_l.onnx (baidu, google), Then put them into ./annotator/ckpts.

Method 2: Download all annotator checkpoints from google or baiduyun (when can not access to huggingface)

If you cannot access HuggingFace, you can download all the annotator checkpoints (such as DW-Pose, depth_zoe, depth_midas, and OpenPose, cost around 4G.) from baidu or google Then extract them into ./annotator/ckpts

πŸ”› Prepare all the data

gdown https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link
tar -zxvf videograin_data.tar.gz

πŸ”₯ VideoGrain Editing

You could reproduce multi-grained editing results in our teaser by running:

bash test.sh 
#or accelerate launch test.py --config config/instance_level/running_two_man/running_3cls_polar_spider_vis_weight.yaml
The result is saved at `./result` . (Click for directory structure)
result
β”œβ”€β”€ run_two_man
β”‚   β”œβ”€β”€ infer_samples
β”‚   β”œβ”€β”€ sample
β”‚           β”œβ”€β”€ step_0         # result image folder
β”‚           β”œβ”€β”€ step_0.mp4       # result video
β”‚           β”œβ”€β”€ source_video.mp4    # the input video