File size: 3,097 Bytes

---
license: cc-by-nc-sa-4.0
library_name: diffusers
pipeline_tag: image-to-video
tags:
- human-animation
- pose-guided
- DiT
---

# HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions

<a href="https://arxiv.org/abs/2505.22977"><img src='https://img.shields.io/badge/arXiv-2505.22977-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>&nbsp;
<a href='https://vivocameraresearch.github.io/hypermotion/'>
  <img src='https://img.shields.io/badge/Project-Page-pink?style=flat&logo=Google%20chrome&logoColor=pink'></a>
<a href="https://github.com/vivoCameraResearch/Hyper-Motion"><img src='https://img.shields.io/badge/Github-Code-blue?style=flat&logo=github&logoColor=white' alt='Github'></a>&nbsp;
<a href="https://creativecommons.org/licenses/by-nc-sa/4.0/"><img src='https://img.shields.io/badge/License-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'></a>&nbsp;

This repository contains the model weights for **HyperMotion**, presented in the paper [HyperMotionX: The Dataset and Benchmark with DiT-Based Pose-Guided Human Image Animation of Complex Motions](https://huggingface.co/papers/2505.22977).

## Introduction

Recent advances in diffusion models have significantly improved conditional video generation, particularly in the pose-guided human image animation task. Although existing methods are capable of generating high-fidelity and time-consistent animation sequences in regular motions and static scenes, there are still obvious limitations when facing complex human body motions (Hypermotion) that contain highly dynamic, non-standard motions.

To address this challenge, we introduce the **Open-HyperMotionX Dataset** and **HyperMotionX Bench**, which provide high-quality human pose annotations and curated video clips for evaluating and improving pose-guided human image animation models under complex human motion conditions. Furthermore, we propose a simple yet powerful DiT-based video generation baseline adopting [Wan2.1-I2V-14B](https://github.com/Wan-Video/Wan2.1) as the base model and design spatial low-frequency enhanced RoPE.

## Inference

To use the model, you can refer to the inference scripts provided in the official [GitHub repository](https://github.com/vivoCameraResearch/Hyper-Motion).

```python
import torch

# Config and model path
config_path = "config/wan2.1/wan_civitai.yaml"
model_name = "shuolin/HyperMotion" # model checkpoints

# Use torch.float16 if GPU does not support torch.bfloat16
weight_dtype = torch.bfloat16
control_video = "path/to/pose_video.mp4" # guided pose video
ref_image = "path/to/image.jpg" # reference image

# For detailed implementation, please refer to scripts/inference.py in the official repo.
```

## Citation
```bibtex
@article{xu2025hypermotion,
  title={Hypermotion: Dit-based pose-guided human image animation of complex motions},
  author={Xu, Shuolin and Zheng, Siming and Wang, Ziyi and Yu, HC and Chen, Jinwei and Zhang, Huaqi, and Zhou Daquan, and Tong-Yee Lee, and Li, Bo and Jiang, Peng-Tao},
  journal={arXiv preprint arXiv:2505.22977},
  year={2025}
}
```