Add model card for LongVie 2

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-to-video
3
+ license: unknown
4
+ ---
5
+
6
+ # LongVie 2: Multimodal Controllable Ultra-Long Video World Model
7
+
8
+ LongVie 2 is a multimodal controllable world model for generating ultra-long videos with depth and pointmap control signals, as presented in the paper [LongVie 2: Multimodal Controllable Ultra-Long Video World Model](https://huggingface.co/papers/2512.13604). It is an end-to-end autoregressive framework trained to enhance controllability, long-term visual quality, and temporal consistency.
9
+
10
+ - πŸ“ [Paper on Hugging Face](https://huggingface.co/papers/2512.13604)
11
+ - 🌐 [Project Page](https://vchitect.github.io/LongVie2-project/)
12
+ - πŸ’» [GitHub Repository](https://github.com/Vchitect/LongVie)
13
+ - πŸš€ [HF Demo](https://huggingface.co/spaces/Vision-CAIR/LongVU)
14
+
15
+ <div align="center">
16
+ <img src="https://longvu.s3.amazonaws.com/assets/demo.gif" alt="LongVie 2 Demo GIF" style="width: 100%; max-width: 650px;">
17
+ </div>
18
+
19
+ ## πŸš€ Quick Start
20
+
21
+ ### Installation
22
+ To get started with LongVie 2, follow the installation steps from the GitHub repository:
23
+
24
+ ```bash
25
+ conda create -n longvie python=3.10 -y
26
+ conda activate longvie
27
+ conda install psutil
28
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
29
+ python -m pip install ninja
30
+ python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.7.2.post1
31
+ cd LongVie
32
+ pip install -e .
33
+ ```
34
+
35
+ ### Download Weights
36
+ 1. Download the base model `Wan2.1-I2V-14B-480P`:
37
+ ```bash
38
+ python download_wan2.1.py
39
+ ```
40
+
41
+ 2. Download the [LongVie2 weights](https://huggingface.co/Vchitect/LongVie2) and place them in `./model/LongVie/`
42
+
43
+ ### Inference
44
+ Generate a 5s video clip (~8-9 mins on a single A100 GPU) using the following command:
45
+ ```bash
46
+ bash sample_longvideo.sh
47
+ ```
48
+
49
+ ## πŸ“„ Citation
50
+
51
+ If you find this work useful, please consider citing:
52
+ ```bibtex
53
+ @misc{gao2025longvie2,
54
+ title={LongVie 2: Multimodal Controllable Ultra-Long Video World Model},
55
+ author={Jianxiong Gao and Zhaoxi Chen and Xian Liu and Junhao Zhuang and Chengming Xu and Jianfeng Feng and Yu Qiao and Yanwei Fu and Chenyang Si and Ziwei Liu},
56
+ year={2025},
57
+ eprint={2512.13604},
58
+ archivePrefix={arXiv},
59
+ primaryClass={cs.CV},
60
+ url={https://arxiv.org/abs/2512.13604},
61
+ }
62
+ ```