|
|
--- |
|
|
pipeline_tag: image-to-video |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- Wan-AI/Wan2.1-I2V-14B-480P |
|
|
--- |
|
|
|
|
|
# LongVie 2: Multimodal Controllable Ultra-Long Video World Model |
|
|
|
|
|
LongVie 2 is a multimodal controllable world model for generating ultra-long videos with depth and pointmap control signals, as presented in the paper [LongVie 2: Multimodal Controllable Ultra-Long Video World Model](https://huggingface.co/papers/2512.13604). It is an end-to-end autoregressive framework trained to enhance controllability, long-term visual quality, and temporal consistency. |
|
|
|
|
|
- π [Paper on Hugging Face](https://huggingface.co/papers/2512.13604) |
|
|
- π [Project Page](https://vchitect.github.io/LongVie2-project/) |
|
|
- π» [GitHub Repository](https://github.com/Vchitect/LongVie) |
|
|
|
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Installation |
|
|
To get started with LongVie 2, follow the installation steps from the GitHub repository: |
|
|
|
|
|
```bash |
|
|
conda create -n longvie python=3.10 -y |
|
|
conda activate longvie |
|
|
conda install psutil |
|
|
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121 |
|
|
python -m pip install ninja |
|
|
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.7.2.post1 |
|
|
cd LongVie |
|
|
pip install -e . |
|
|
``` |
|
|
|
|
|
### Download Weights |
|
|
1. Download the base model `Wan2.1-I2V-14B-480P`: |
|
|
```bash |
|
|
python download_wan2.1.py |
|
|
``` |
|
|
|
|
|
2. Download the [LongVie2 weights](https://huggingface.co/Vchitect/LongVie2) and place them in `./model/LongVie/` |
|
|
|
|
|
### Inference |
|
|
Generate a 5s video clip (~8-9 mins on a single A100 GPU) using the following command: |
|
|
```bash |
|
|
bash sample_longvideo.sh |
|
|
``` |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you find this work useful, please consider citing: |
|
|
```bibtex |
|
|
@misc{gao2025longvie2, |
|
|
title={LongVie 2: Multimodal Controllable Ultra-Long Video World Model}, |
|
|
author={Jianxiong Gao and Zhaoxi Chen and Xian Liu and Junhao Zhuang and Chengming Xu and Jianfeng Feng and Yu Qiao and Yanwei Fu and Chenyang Si and Ziwei Liu}, |
|
|
year={2025}, |
|
|
eprint={2512.13604}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2512.13604}, |
|
|
} |
|
|
``` |