nielsr HF Staff

Update model card with pipeline tag, license, and resources

2730884 verified about 2 months ago

3.1 kB

	---
	license: cc-by-nc-sa-4.0
	library_name: diffusers
	pipeline_tag: image-to-video
	tags:
	- human-animation
	- pose-guided
	- DiT
	---

	# HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions

	<a href="https://arxiv.org/abs/2505.22977"><img src='https://img.shields.io/badge/arXiv-2505.22977-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>
	<a href='https://vivocameraresearch.github.io/hypermotion/'>
	<img src='https://img.shields.io/badge/Project-Page-pink?style=flat&logo=Google%20chrome&logoColor=pink'></a>
	<a href="https://github.com/vivoCameraResearch/Hyper-Motion"><img src='https://img.shields.io/badge/Github-Code-blue?style=flat&logo=github&logoColor=white' alt='Github'></a>
	<a href="https://creativecommons.org/licenses/by-nc-sa/4.0/"><img src='https://img.shields.io/badge/License-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'></a>

	This repository contains the model weights for HyperMotion, presented in the paper [HyperMotionX: The Dataset and Benchmark with DiT-Based Pose-Guided Human Image Animation of Complex Motions](https://huggingface.co/papers/2505.22977).

	## Introduction

	Recent advances in diffusion models have significantly improved conditional video generation, particularly in the pose-guided human image animation task. Although existing methods are capable of generating high-fidelity and time-consistent animation sequences in regular motions and static scenes, there are still obvious limitations when facing complex human body motions (Hypermotion) that contain highly dynamic, non-standard motions.

	To address this challenge, we introduce the Open-HyperMotionX Dataset and HyperMotionX Bench, which provide high-quality human pose annotations and curated video clips for evaluating and improving pose-guided human image animation models under complex human motion conditions. Furthermore, we propose a simple yet powerful DiT-based video generation baseline adopting [Wan2.1-I2V-14B](https://github.com/Wan-Video/Wan2.1) as the base model and design spatial low-frequency enhanced RoPE.

	## Inference

	To use the model, you can refer to the inference scripts provided in the official [GitHub repository](https://github.com/vivoCameraResearch/Hyper-Motion).

	```python
	import torch

	# Config and model path
	config_path = "config/wan2.1/wan_civitai.yaml"
	model_name = "shuolin/HyperMotion" # model checkpoints

	# Use torch.float16 if GPU does not support torch.bfloat16
	weight_dtype = torch.bfloat16
	control_video = "path/to/pose_video.mp4" # guided pose video
	ref_image = "path/to/image.jpg" # reference image

	# For detailed implementation, please refer to scripts/inference.py in the official repo.
	```

	## Citation
	```bibtex
	@article{xu2025hypermotion,
	title={Hypermotion: Dit-based pose-guided human image animation of complex motions},
	author={Xu, Shuolin and Zheng, Siming and Wang, Ziyi and Yu, HC and Chen, Jinwei and Zhang, Huaqi, and Zhou Daquan, and Tong-Yee Lee, and Li, Bo and Jiang, Peng-Tao},
	journal={arXiv preprint arXiv:2505.22977},
	year={2025}
	}
	```