Add model card for LongVie 2
#2
by
nielsr
HF Staff
- opened
README.md
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: text-to-video
|
| 3 |
+
license: unknown
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# LongVie 2: Multimodal Controllable Ultra-Long Video World Model
|
| 7 |
+
|
| 8 |
+
LongVie 2 is a multimodal controllable world model for generating ultra-long videos with depth and pointmap control signals, as presented in the paper [LongVie 2: Multimodal Controllable Ultra-Long Video World Model](https://huggingface.co/papers/2512.13604). It is an end-to-end autoregressive framework trained to enhance controllability, long-term visual quality, and temporal consistency.
|
| 9 |
+
|
| 10 |
+
- π [Paper on Hugging Face](https://huggingface.co/papers/2512.13604)
|
| 11 |
+
- π [Project Page](https://vchitect.github.io/LongVie2-project/)
|
| 12 |
+
- π» [GitHub Repository](https://github.com/Vchitect/LongVie)
|
| 13 |
+
- π [HF Demo](https://huggingface.co/spaces/Vision-CAIR/LongVU)
|
| 14 |
+
|
| 15 |
+
<div align="center">
|
| 16 |
+
<img src="https://longvu.s3.amazonaws.com/assets/demo.gif" alt="LongVie 2 Demo GIF" style="width: 100%; max-width: 650px;">
|
| 17 |
+
</div>
|
| 18 |
+
|
| 19 |
+
## π Quick Start
|
| 20 |
+
|
| 21 |
+
### Installation
|
| 22 |
+
To get started with LongVie 2, follow the installation steps from the GitHub repository:
|
| 23 |
+
|
| 24 |
+
```bash
|
| 25 |
+
conda create -n longvie python=3.10 -y
|
| 26 |
+
conda activate longvie
|
| 27 |
+
conda install psutil
|
| 28 |
+
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
|
| 29 |
+
python -m pip install ninja
|
| 30 |
+
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.7.2.post1
|
| 31 |
+
cd LongVie
|
| 32 |
+
pip install -e .
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
### Download Weights
|
| 36 |
+
1. Download the base model `Wan2.1-I2V-14B-480P`:
|
| 37 |
+
```bash
|
| 38 |
+
python download_wan2.1.py
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
2. Download the [LongVie2 weights](https://huggingface.co/Vchitect/LongVie2) and place them in `./model/LongVie/`
|
| 42 |
+
|
| 43 |
+
### Inference
|
| 44 |
+
Generate a 5s video clip (~8-9 mins on a single A100 GPU) using the following command:
|
| 45 |
+
```bash
|
| 46 |
+
bash sample_longvideo.sh
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
## π Citation
|
| 50 |
+
|
| 51 |
+
If you find this work useful, please consider citing:
|
| 52 |
+
```bibtex
|
| 53 |
+
@misc{gao2025longvie2,
|
| 54 |
+
title={LongVie 2: Multimodal Controllable Ultra-Long Video World Model},
|
| 55 |
+
author={Jianxiong Gao and Zhaoxi Chen and Xian Liu and Junhao Zhuang and Chengming Xu and Jianfeng Feng and Yu Qiao and Yanwei Fu and Chenyang Si and Ziwei Liu},
|
| 56 |
+
year={2025},
|
| 57 |
+
eprint={2512.13604},
|
| 58 |
+
archivePrefix={arXiv},
|
| 59 |
+
primaryClass={cs.CV},
|
| 60 |
+
url={https://arxiv.org/abs/2512.13604},
|
| 61 |
+
}
|
| 62 |
+
```
|