File size: 6,707 Bytes
b27f763 e223ebb b27f763 e223ebb 064748c b27f763 33852b9 064748c e223ebb 064748c b27f763 064748c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
---
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
license: apache-2.0
pipeline_tag: image-to-video
---
<div align="center">
<img src="assets/teaser.png">
<a href="https://hyokong.github.io/worldwarp-page/"><h1>π WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion π</h1></a>
</h2>
</div>
<h5 align="center">
[](https://hyokong.github.io/worldwarp-page/)
[](https://arxiv.org/abs/2512.19678)
[](https://huggingface.co/imsuperkong/worldwarp) [](https://www.youtube.com/watch?v=rfMHxb--cKs)
[Hanyang Kong](https://hyokong.github.io/),
[Xingyi Yang](https://adamdad.github.io/),
Xiaoxu Zheng,
[Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
</h5>
**TL;DR**: π Single-image long-range view generation via an <u>asynchronous chunk-wise autoregressive diffusion framework</u> that utilizes <u>explicit camera conditioning</u> and <u>online 3D cache</u> for geometric consistency.
This repository contains the weights for **WorldWarp**, presented in [WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion](https://arxiv.org/abs/2512.19678).
## π¬ Demo Video
βΆοΈ **Click the GIF to watch the full video with sound.**
<p align="center">
<a href="https://www.youtube.com/watch?v=rfMHxb--cKs">
<img src="assets/web_teaser.gif" alt="WorldWarp Demo" width="100%">
</a>
</p>
## π οΈ Installation
> β οΈ **Hardware Note:** The current implementation requires high GPU memory (~40GB VRAM). We are currently optimizing the code to reduce this footprint.
### 𧬠Cloning the Repository
The repository contains submodules, thus please check it out with
```bash
git clone https://github.com/HyoKong/WorldWarp.git --recursive
cd WorldWarp
```
### π Create environment
Create a conda environment and install dependencies:
```
conda create -n worldwarp python=3.12 -y
conda activate worldwarp
```
### π₯ Install PyTorch
Install PyTorch with CUDA 12.6 support (or visit [PyTorch Previous Versions](https://pytorch.org/get-started/previous-versions/) for other CUDA configurations):
```bash
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
```
### π¦ Install Dependencies & Compile Extensions
These packages require compilation against the specific PyTorch version installed above.
```bash
# Core compiled dependencies
pip install flash-attn --no-build-isolation
pip install "git+https://github.com/facebookresearch/pytorch3d.git" --no-build-isolation
# Local modules
pip install src/fused-ssim/ --no-build-isolation
pip install src/simple-knn/ --no-build-isolation
# Remaining python dependencies
pip install -r requirements.txt
```
### ποΈ Build Other Extensions
```bash
cd src/ttt3r/croco/models/curope/
python setup.py build_ext --inplace
cd - # Returns to the project root
```
## βοΈ Download checkpoints
```
mkdir ckpt
hf download Wan-AI/Wan2.1-T2V-1.3B-Diffusers --local-dir ckpt/Wan-AI/Wan2.1-T2V-1.3B-Diffusers
hf download Qwen/Qwen2.5-VL-7B-Instruct --local-dir ckpt/Qwen/Qwen2.5-VL-7B-Instruct
hf download imsuperkong/worldwarp --local-dir ckpt/
cd src/ttt3r/
gdown --fuzzy https://drive.google.com/file/d/1Asz-ZB3FfpzZYwunhQvNPZEUA8XUNAYD/view?usp=drive_link
cd ../..
```
## π¨ GUI Demo
```bash
python gradio_demo.py
```
The web interface will open at `http://localhost:7890`.
---
### π Quick start:
**1οΈβ£ Choose Starting Image**
- **π Examples Tab**: Click a pre-made example image (prompt auto-fills)
- **π¨ Generate Tab**: Click "Generate First Frame" from your prompt
- **π€ Upload Tab**: Upload your own image
**2οΈβ£ Select Camera Movement** (Recommended: πΉ From Video)
- **From Video** (Easiest and most reliable)
- Click **"πΉ From Video"** mode
- Select an example video from the gallery OR upload your own
- Click **"π― Load Poses"** to extract camera trajectory
- Poses are automatically cached for reuse
- **Preset Movements**
- Select **"π― Preset"** mode
- Choose movements: `DOLLY_IN`, `PAN_LEFT`, `PAN_RIGHT`, etc.
- Can combine: e.g., `DOLLY_IN + PAN_RIGHT`
- **Custom** (Advanced)
- Select **"π§ Custom"** mode
- Manually control rotation and translation parameters
**3οΈβ£ Configure & Generate**
**Essential Parameters:**
- πͺ **Strength (0.5 - 0.8)**
- **Higher (0.7-0.8)**: More generated details, richer content
- β οΈ May introduce content changes due to higher creative freedom
- **Lower (0.5-0.6)**: More accurate camera control, closer to input
- β οΈ May produce blurry results due to limited diffusion model freedom
- **Trade-off**: Higher strength = more details but less control; Lower strength = better control but potentially blurry
- β‘ **Speed Multiplier**
- **Purpose**: Adjust camera movement velocity to match your scene scale
- **Why needed**: Reference video's camera movement scale may not match your scene (e.g., drone video moving 10 meters may be too fast for a small room)
- **< 1.0**: Slower camera movement (e.g., 0.5 = half speed)
- **= 1.0**: Original speed from reference
- **> 1.0**: Faster camera movement (e.g., 2.0 = double speed)
- **Tip**: Start with 1.0, then adjust based on whether motion feels too fast or too slow
---
#### π Best Practices
- ποΈ **Generate one chunk at a time**
- Lets you preview each chunk's quality before continuing
- Easier to identify issues early
- β©οΈ **Use Rollback for iteration**
- If a chunk is unsatisfactory, enter its number in **"Rollback to #"**
- Click **"βοΈ Rollback"** to remove it
- Adjust parameters and regenerate
- ποΈ **Adjust Speed Multiplier per scene**
- If camera moves too fast β decrease value (e.g., 0.5-0.7)
- If camera moves too slow β increase value (e.g., 1.5-2.0)
## π Acknowledgements
Our code is based on the following awesome repositories:
- [DFoT](https://github.com/kwsong0113/diffusion-forcing-transformer)
- [TTT3R](https://github.com/Inception3D/TTT3R)
We thank the authors for releasing their code!
## π Citation
If you find our work useful, please cite:
```bibtex
@misc{kong2025worldwarp,
title={WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion},
author={Hanyang Kong and Xingyi Yang and Xiaoxu Zheng and Xinchao Wang},
year={2025},
eprint={2512.19678},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |