everanimate / README.md
vita-video-gen's picture
Upload README.md with huggingface_hub
9e9f46a verified
---
license: mit
language:
- en
pipeline_tag: image-to-video
base_model:
- Wan-AI/Wan2.2-Animate-14B
tags:
- image-to-video
- video-generation
- human-animation
- long-video
- lora
- wan
- diffsynth
- everanimate
---
# EverAnimate
**EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration**
EverAnimate is a GPU-friendly post-training method for long-horizon human animation. It uses lightweight rank-32 LoRA adaptation on top of Wan2.2-Animate and improves long-horizon generation through persistent latent propagation and restorative flow matching.
- **Project page:** [everanimate.github.io/homepage](https://everanimate.github.io/homepage/)
- **Paper:** [arXiv:2605.15042](https://arxiv.org/abs/2605.15042)
- **Code:** [github.com/vita-epfl/EverAnimate](https://github.com/vita-epfl/EverAnimate)
- **Base model:** [Wan-AI/Wan2.2-Animate-14B](https://huggingface.co/Wan-AI/Wan2.2-Animate-14B)
## Highlights
- **GPU-friendly training:** rank-32 LoRA post-training on Wan2.2-Animate reaches strong results with only thousands of iterations on 4 GPUs.
- **Long-horizon animation:** supports minute-scale human animation with controlled identity and motion consistency.
- **Fully open source:** code, training/inference scripts, LoRA checkpoints, demo data, and ablation videos are released for reproducible research.
## Repository Contents
```text
ckpts/
`-- everanimate-v1-lora32/
|-- stage1_480p.safetensors
|-- stage2_480p.safetensors
`-- stage3_720p_beta.safetensors # Beta, tested only at small scale
data/
|-- train/ # Minimal training sample
|-- test/ # Inference demo
`-- ablation/ # Stage-1 and Stage-2 ablation videos
```
The ablation videos are the two-stage outputs: `data/ablation/stage1.mp4` is the Stage-1 result, and `data/ablation/stage2.mp4` is the Stage-2 result.
## Download
Download checkpoints and demo data:
```bash
hf download epfl-vita/everanimate \
--repo-type model \
--include "ckpts/**" \
--include "data/**" \
--local-dir .
```
Download only the LoRA checkpoints:
```bash
hf download epfl-vita/everanimate \
--repo-type model \
--include "ckpts/everanimate-v1-lora32/*.safetensors" \
--local-dir .
```
Download only the data:
```bash
hf download epfl-vita/everanimate \
--repo-type model \
--include "data/**" \
--local-dir .
```
For full setup, clone the code repo and run:
```bash
git clone https://github.com/vita-epfl/EverAnimate.git
cd EverAnimate
bash scripts/download_models.sh
```
The script downloads the Wan2.2-Animate base files from `Wan-AI/Wan2.2-Animate-14B` and the EverAnimate files from this repository.
## Usage
Inference with the bundled demo:
```bash
bash test.sh
```
Training with the bundled minimal sample:
```bash
bash train_stage1.sh
bash train_stage2.sh
```
See the GitHub repository for environment setup, scripts, and implementation details.
## Model Details
- **Base model:** Wan2.2-Animate
- **Checkpoint type:** LoRA
- **LoRA rank:** 32
- **Resolution:** 480p stable checkpoints and a 720p beta checkpoint
- **Stages:** Stage 1 learns latent propagation; Stage 2 adds restorative flow matching behavior; Stage 3 provides a 720p beta LoRA.
The 720p checkpoint is a beta release and has only been tested at small scale. A more thoroughly fine-tuned and evaluated 720p checkpoint is planned for a future update.
## Intended Use
These checkpoints and sample assets are intended for research on controllable human image animation, long-horizon video generation, and reproducible comparison of EverAnimate's two-stage post-training pipeline.
## Limitations
The released LoRA checkpoints inherit the capabilities and limitations of the Wan2.2-Animate backbone. Performance can vary with input image quality, pose accuracy, motion difficulty, and generation length. Users should follow the licenses and usage terms of the base model and any input assets.
## Citation
```bibtex
@misc{li2026everanimate,
title = {EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration},
author = {Wuyang Li and Yang Gao and Mariam Hassan and Lan Feng and Wentao Pan and Po-Chien Luan and Alexandre Alahi},
year = {2026},
eprint = {2605.15042},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2605.15042}
}
```
## Acknowledgements
This work builds on the following projects:
- [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)
- [Wan-Animate: Unified Character Animation and Replacement with Holistic Replication](https://arxiv.org/abs/2509.14055)
- [Stable Video Infinity: Infinite-Length Video Generation with Error Recycling](https://stable-video-infinity.github.io/homepage/)
This work has also been inspired by SVI 2.0 Pro and LongCat Video Avatar.