Upload README.md with huggingface_hub

9e9f46a verified 12 days ago

4.81 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: image-to-video
	base_model:
	- Wan-AI/Wan2.2-Animate-14B
	tags:
	- image-to-video
	- video-generation
	- human-animation
	- long-video
	- lora
	- wan
	- diffsynth
	- everanimate
	---

	# EverAnimate

	EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

	EverAnimate is a GPU-friendly post-training method for long-horizon human animation. It uses lightweight rank-32 LoRA adaptation on top of Wan2.2-Animate and improves long-horizon generation through persistent latent propagation and restorative flow matching.

	- Project page: [everanimate.github.io/homepage](https://everanimate.github.io/homepage/)
	- Paper: [arXiv:2605.15042](https://arxiv.org/abs/2605.15042)
	- Code: [github.com/vita-epfl/EverAnimate](https://github.com/vita-epfl/EverAnimate)
	- Base model: [Wan-AI/Wan2.2-Animate-14B](https://huggingface.co/Wan-AI/Wan2.2-Animate-14B)

	## Highlights

	- GPU-friendly training: rank-32 LoRA post-training on Wan2.2-Animate reaches strong results with only thousands of iterations on 4 GPUs.
	- Long-horizon animation: supports minute-scale human animation with controlled identity and motion consistency.
	- Fully open source: code, training/inference scripts, LoRA checkpoints, demo data, and ablation videos are released for reproducible research.

	## Repository Contents

	```text
	ckpts/
	`-- everanimate-v1-lora32/
	\|-- stage1_480p.safetensors
	\|-- stage2_480p.safetensors
	`-- stage3_720p_beta.safetensors # Beta, tested only at small scale

	data/
	\|-- train/ # Minimal training sample
	\|-- test/ # Inference demo
	`-- ablation/ # Stage-1 and Stage-2 ablation videos
	```

	The ablation videos are the two-stage outputs: `data/ablation/stage1.mp4` is the Stage-1 result, and `data/ablation/stage2.mp4` is the Stage-2 result.

	## Download

	Download checkpoints and demo data:

	```bash
	hf download epfl-vita/everanimate \
	--repo-type model \
	--include "ckpts/**" \
	--include "data/**" \
	--local-dir .
	```

	Download only the LoRA checkpoints:

	```bash
	hf download epfl-vita/everanimate \
	--repo-type model \
	--include "ckpts/everanimate-v1-lora32/*.safetensors" \
	--local-dir .
	```

	Download only the data:

	```bash
	hf download epfl-vita/everanimate \
	--repo-type model \
	--include "data/**" \
	--local-dir .
	```

	For full setup, clone the code repo and run:

	```bash
	git clone https://github.com/vita-epfl/EverAnimate.git
	cd EverAnimate
	bash scripts/download_models.sh
	```

	The script downloads the Wan2.2-Animate base files from `Wan-AI/Wan2.2-Animate-14B` and the EverAnimate files from this repository.

	## Usage

	Inference with the bundled demo:

	```bash
	bash test.sh
	```

	Training with the bundled minimal sample:

	```bash
	bash train_stage1.sh
	bash train_stage2.sh
	```

	See the GitHub repository for environment setup, scripts, and implementation details.

	## Model Details

	- Base model: Wan2.2-Animate
	- Checkpoint type: LoRA
	- LoRA rank: 32
	- Resolution: 480p stable checkpoints and a 720p beta checkpoint
	- Stages: Stage 1 learns latent propagation; Stage 2 adds restorative flow matching behavior; Stage 3 provides a 720p beta LoRA.

	The 720p checkpoint is a beta release and has only been tested at small scale. A more thoroughly fine-tuned and evaluated 720p checkpoint is planned for a future update.

	## Intended Use

	These checkpoints and sample assets are intended for research on controllable human image animation, long-horizon video generation, and reproducible comparison of EverAnimate's two-stage post-training pipeline.

	## Limitations

	The released LoRA checkpoints inherit the capabilities and limitations of the Wan2.2-Animate backbone. Performance can vary with input image quality, pose accuracy, motion difficulty, and generation length. Users should follow the licenses and usage terms of the base model and any input assets.

	## Citation

	```bibtex
	@misc{li2026everanimate,
	title = {EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration},
	author = {Wuyang Li and Yang Gao and Mariam Hassan and Lan Feng and Wentao Pan and Po-Chien Luan and Alexandre Alahi},
	year = {2026},
	eprint = {2605.15042},
	archivePrefix = {arXiv},
	primaryClass = {cs.CV},
	url = {https://arxiv.org/abs/2605.15042}
	}
	```

	## Acknowledgements

	This work builds on the following projects:

	- [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)
	- [Wan-Animate: Unified Character Animation and Replacement with Holistic Replication](https://arxiv.org/abs/2509.14055)
	- [Stable Video Infinity: Infinite-Length Video Generation with Error Recycling](https://stable-video-infinity.github.io/homepage/)

	This work has also been inspired by SVI 2.0 Pro and LongCat Video Avatar.