|
|
--- |
|
|
license: other |
|
|
pipeline_tag: image-to-video |
|
|
library_name: diffusers |
|
|
--- |
|
|
|
|
|
# MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance |
|
|
|
|
|
This repository contains the model weights for **MimicMotion**, a controllable video generation framework proposed in the paper [MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance](https://huggingface.co/papers/2406.19680). |
|
|
|
|
|
MimicMotion addresses significant challenges in video generation, such as controllability, video length, and richness of details. Our approach introduces several innovations: |
|
|
- **Confidence-aware pose guidance:** Ensures high frame quality and temporal smoothness. |
|
|
- **Regional loss amplification:** Significantly reduces image distortion based on pose confidence. |
|
|
- **Progressive latent fusion strategy:** Enables generation of arbitrary length videos with acceptable resource consumption. |
|
|
|
|
|
With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in various aspects. |
|
|
|
|
|
**[\ud83d\udcda Paper](https://huggingface.co/papers/2406.19680)** | **[\ud83c\udf10 Project Page](https://tencent.github.io/MimicMotion)** | **[\ud83d\udcbb GitHub Repo](https://github.com/Tencent/MimicMotion)** |
|
|
|
|
|
<div align="center"> |
|
|
<img src="https://huggingface.co/tencent/MimicMotion/resolve/main/assets/figures/model_structure.png" alt="MimicMotion Model Architecture" width="640"/> |
|
|
<br/> |
|
|
<i>An overview of the framework of MimicMotion.</i> |
|
|
</div> |
|
|
|
|
|
## Sample Usage |
|
|
|
|
|
For the initial released version of the model checkpoint, it supports generating videos with a maximum of 72 frames at a 576x1024 resolution. If you encounter insufficient memory issues, you can appropriately reduce the number of frames. |
|
|
|
|
|
### Environment setup |
|
|
|
|
|
Recommend python 3+ with torch 2.x are validated with an Nvidia V100 GPU. Follow the command below to install all the dependencies of python: |
|
|
|
|
|
```bash |
|
|
conda env create -f environment.yaml |
|
|
conda activate mimicmotion |
|
|
``` |
|
|
|
|
|
### Download weights |
|
|
If you experience connection issues with Hugging Face, you can utilize the mirror endpoint by setting the environment variable: `export HF_ENDPOINT=https://hf-mirror.com`. |
|
|
Please download weights manually as follows: |
|
|
```bash |
|
|
cd MimicMotions/ |
|
|
mkdir models |
|
|
``` |
|
|
1. Download DWPose pretrained model: [dwpose](https://huggingface.co/yzd-v/DWPose/tree/main) |
|
|
```bash |
|
|
mkdir -p models/DWPose |
|
|
wget https://huggingface.co/yzd-v/DWPose/resolve/main/yolox_l.onnx?download=true -O models/DWPose/yolox_l.onnx |
|
|
wget https://huggingface.co/yzd-v/DWPose/resolve/main/dw-ll_ucoco_384.onnx?download=true -O models/DWPose/dw-ll_ucoco_384.onnx |
|
|
``` |
|
|
2. Download the pre-trained checkpoint of MimicMotion from [Huggingface](https://huggingface.co/tencent/MimicMotion) |
|
|
```bash |
|
|
wget -P models/ https://huggingface.co/tencent/MimicMotion/resolve/main/MimicMotion_1-1.pth |
|
|
``` |
|
|
3. The SVD model [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1) will be automatically downloaded. |
|
|
|
|
|
Finally, all the weights should be organized in `models` as follows: |
|
|
|
|
|
``` |
|
|
models/ |
|
|
βββ DWPose |
|
|
βΒ Β βββ dw-ll_ucoco_384.onnx |
|
|
βΒ Β βββ yolox_l.onnx |
|
|
βββ MimicMotion_1-1.pth |
|
|
``` |
|
|
|
|
|
### Model inference |
|
|
|
|
|
A sample configuration for testing is provided as `test.yaml`. You can also easily modify the various configurations according to your needs. |
|
|
|
|
|
```bash |
|
|
python inference.py --inference_config configs/test.yaml |
|
|
``` |
|
|
|
|
|
Tips: if your GPU memory is limited, try set env `PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256`. |
|
|
|
|
|
## License |
|
|
|
|
|
These model weights of MimicMotion are fine-tuned with the assistance of Stable Video Diffusion (SVD) Powered by Stability AI. For detailed license information, please refer to [`LICENSE`](https://huggingface.co/tencent/MimicMotion/blob/main/LICENSE) and [`NOTICE`](https://huggingface.co/tencent/MimicMotion/blob/main/NOTICE) files. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{zhang2025mimicmotion, |
|
|
title={MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance}, |
|
|
author={Yuang Zhang and Jiaxi Gu and Li-Wen Wang and Han Wang and Junqi Cheng and Yuefeng Zhu and Fangyuan Zou}, |
|
|
booktitle={International Conference on Machine Learning}, |
|
|
year={2025} |
|
|
} |
|
|
``` |