|
|
--- |
|
|
base_model: |
|
|
- Wan-AI/Wan2.1-T2V-1.3B |
|
|
license: apache-2.0 |
|
|
pipeline_tag: image-to-video |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
<img src="assets/teaser.png"> |
|
|
|
|
|
<a href="https://hyokong.github.io/worldwarp-page/"><h1>π WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion π</h1></a> |
|
|
</h2> |
|
|
</div> |
|
|
|
|
|
<h5 align="center"> |
|
|
|
|
|
[](https://hyokong.github.io/worldwarp-page/) |
|
|
[](https://arxiv.org/abs/2512.19678) |
|
|
[](https://huggingface.co/imsuperkong/worldwarp) [](https://www.youtube.com/watch?v=rfMHxb--cKs) |
|
|
|
|
|
|
|
|
[Hanyang Kong](https://hyokong.github.io/), |
|
|
[Xingyi Yang](https://adamdad.github.io/), |
|
|
Xiaoxu Zheng, |
|
|
[Xinchao Wang](https://sites.google.com/site/sitexinchaowang/) |
|
|
</h5> |
|
|
|
|
|
**TL;DR**: π Single-image long-range view generation via an <u>asynchronous chunk-wise autoregressive diffusion framework</u> that utilizes <u>explicit camera conditioning</u> and <u>online 3D cache</u> for geometric consistency. |
|
|
|
|
|
This repository contains the weights for **WorldWarp**, presented in [WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion](https://arxiv.org/abs/2512.19678). |
|
|
|
|
|
## π¬ Demo Video |
|
|
|
|
|
βΆοΈ **Click the GIF to watch the full video with sound.** |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://www.youtube.com/watch?v=rfMHxb--cKs"> |
|
|
<img src="assets/web_teaser.gif" alt="WorldWarp Demo" width="100%"> |
|
|
</a> |
|
|
</p> |
|
|
|
|
|
## π οΈ Installation |
|
|
|
|
|
> β οΈ **Hardware Note:** The current implementation requires high GPU memory (~40GB VRAM). We are currently optimizing the code to reduce this footprint. |
|
|
|
|
|
### 𧬠Cloning the Repository |
|
|
The repository contains submodules, thus please check it out with |
|
|
```bash |
|
|
git clone https://github.com/HyoKong/WorldWarp.git --recursive |
|
|
cd WorldWarp |
|
|
``` |
|
|
|
|
|
### π Create environment |
|
|
|
|
|
Create a conda environment and install dependencies: |
|
|
``` |
|
|
conda create -n worldwarp python=3.12 -y |
|
|
conda activate worldwarp |
|
|
``` |
|
|
|
|
|
### π₯ Install PyTorch |
|
|
Install PyTorch with CUDA 12.6 support (or visit [PyTorch Previous Versions](https://pytorch.org/get-started/previous-versions/) for other CUDA configurations): |
|
|
```bash |
|
|
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126 |
|
|
``` |
|
|
|
|
|
### π¦ Install Dependencies & Compile Extensions |
|
|
These packages require compilation against the specific PyTorch version installed above. |
|
|
|
|
|
```bash |
|
|
# Core compiled dependencies |
|
|
pip install flash-attn --no-build-isolation |
|
|
pip install "git+https://github.com/facebookresearch/pytorch3d.git" --no-build-isolation |
|
|
|
|
|
# Local modules |
|
|
pip install src/fused-ssim/ --no-build-isolation |
|
|
pip install src/simple-knn/ --no-build-isolation |
|
|
|
|
|
# Remaining python dependencies |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
### ποΈ Build Other Extensions |
|
|
```bash |
|
|
cd src/ttt3r/croco/models/curope/ |
|
|
python setup.py build_ext --inplace |
|
|
cd - # Returns to the project root |
|
|
``` |
|
|
|
|
|
|
|
|
## βοΈ Download checkpoints |
|
|
|
|
|
``` |
|
|
mkdir ckpt |
|
|
hf download Wan-AI/Wan2.1-T2V-1.3B-Diffusers --local-dir ckpt/Wan-AI/Wan2.1-T2V-1.3B-Diffusers |
|
|
hf download Qwen/Qwen2.5-VL-7B-Instruct --local-dir ckpt/Qwen/Qwen2.5-VL-7B-Instruct |
|
|
hf download imsuperkong/worldwarp --local-dir ckpt/ |
|
|
|
|
|
cd src/ttt3r/ |
|
|
gdown --fuzzy https://drive.google.com/file/d/1Asz-ZB3FfpzZYwunhQvNPZEUA8XUNAYD/view?usp=drive_link |
|
|
cd ../.. |
|
|
``` |
|
|
|
|
|
## π¨ GUI Demo |
|
|
|
|
|
```bash |
|
|
python gradio_demo.py |
|
|
``` |
|
|
|
|
|
The web interface will open at `http://localhost:7890`. |
|
|
|
|
|
--- |
|
|
|
|
|
### π Quick start: |
|
|
|
|
|
**1οΈβ£ Choose Starting Image** |
|
|
|
|
|
- **π Examples Tab**: Click a pre-made example image (prompt auto-fills) |
|
|
- **π¨ Generate Tab**: Click "Generate First Frame" from your prompt |
|
|
- **π€ Upload Tab**: Upload your own image |
|
|
|
|
|
**2οΈβ£ Select Camera Movement** (Recommended: πΉ From Video) |
|
|
|
|
|
- **From Video** (Easiest and most reliable) |
|
|
- Click **"πΉ From Video"** mode |
|
|
- Select an example video from the gallery OR upload your own |
|
|
- Click **"π― Load Poses"** to extract camera trajectory |
|
|
- Poses are automatically cached for reuse |
|
|
|
|
|
- **Preset Movements** |
|
|
- Select **"π― Preset"** mode |
|
|
- Choose movements: `DOLLY_IN`, `PAN_LEFT`, `PAN_RIGHT`, etc. |
|
|
- Can combine: e.g., `DOLLY_IN + PAN_RIGHT` |
|
|
|
|
|
- **Custom** (Advanced) |
|
|
- Select **"π§ Custom"** mode |
|
|
- Manually control rotation and translation parameters |
|
|
|
|
|
**3οΈβ£ Configure & Generate** |
|
|
|
|
|
**Essential Parameters:** |
|
|
|
|
|
- πͺ **Strength (0.5 - 0.8)** |
|
|
- **Higher (0.7-0.8)**: More generated details, richer content |
|
|
- β οΈ May introduce content changes due to higher creative freedom |
|
|
- **Lower (0.5-0.6)**: More accurate camera control, closer to input |
|
|
- β οΈ May produce blurry results due to limited diffusion model freedom |
|
|
- **Trade-off**: Higher strength = more details but less control; Lower strength = better control but potentially blurry |
|
|
|
|
|
- β‘ **Speed Multiplier** |
|
|
- **Purpose**: Adjust camera movement velocity to match your scene scale |
|
|
- **Why needed**: Reference video's camera movement scale may not match your scene (e.g., drone video moving 10 meters may be too fast for a small room) |
|
|
- **< 1.0**: Slower camera movement (e.g., 0.5 = half speed) |
|
|
- **= 1.0**: Original speed from reference |
|
|
- **> 1.0**: Faster camera movement (e.g., 2.0 = double speed) |
|
|
- **Tip**: Start with 1.0, then adjust based on whether motion feels too fast or too slow |
|
|
|
|
|
--- |
|
|
|
|
|
#### π Best Practices |
|
|
|
|
|
- ποΈ **Generate one chunk at a time** |
|
|
- Lets you preview each chunk's quality before continuing |
|
|
- Easier to identify issues early |
|
|
|
|
|
- β©οΈ **Use Rollback for iteration** |
|
|
- If a chunk is unsatisfactory, enter its number in **"Rollback to #"** |
|
|
- Click **"βοΈ Rollback"** to remove it |
|
|
- Adjust parameters and regenerate |
|
|
|
|
|
- ποΈ **Adjust Speed Multiplier per scene** |
|
|
- If camera moves too fast β decrease value (e.g., 0.5-0.7) |
|
|
- If camera moves too slow β increase value (e.g., 1.5-2.0) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## π Acknowledgements |
|
|
Our code is based on the following awesome repositories: |
|
|
|
|
|
- [DFoT](https://github.com/kwsong0113/diffusion-forcing-transformer) |
|
|
- [TTT3R](https://github.com/Inception3D/TTT3R) |
|
|
|
|
|
We thank the authors for releasing their code! |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you find our work useful, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{kong2025worldwarp, |
|
|
title={WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion}, |
|
|
author={Hanyang Kong and Xingyi Yang and Xiaoxu Zheng and Xinchao Wang}, |
|
|
year={2025}, |
|
|
eprint={2512.19678}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV} |
|
|
} |
|
|
``` |