File size: 6,707 Bytes
b27f763
 
 
e223ebb
 
b27f763
e223ebb
064748c
 
 
 
 
 
 
 
 
 
b27f763
33852b9
064748c
 
 
 
 
 
 
 
 
 
e223ebb
064748c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b27f763
064748c
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
license: apache-2.0
pipeline_tag: image-to-video
---

<div align="center">
  <img src="assets/teaser.png">

<a href="https://hyokong.github.io/worldwarp-page/"><h1>🌏 WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion πŸŒ€</h1></a>
</h2>
</div>

<h5 align="center">

[![Home Page](https://img.shields.io/badge/Project-Website-33728E.svg)](https://hyokong.github.io/worldwarp-page/) 
[![arXiv](https://img.shields.io/badge/Arxiv-2512.19678-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2512.19678) 
[![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/imsuperkong/worldwarp) [![Watch on YouTube](https://img.shields.io/badge/YouTube-Demo_Video-red?style=flat&logo=youtube)](https://www.youtube.com/watch?v=rfMHxb--cKs)


[Hanyang Kong](https://hyokong.github.io/),
[Xingyi Yang](https://adamdad.github.io/),
Xiaoxu Zheng,
[Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
</h5>

**TL;DR**: πŸ”­ Single-image long-range view generation via an <u>asynchronous chunk-wise autoregressive diffusion framework</u> that utilizes <u>explicit camera conditioning</u> and <u>online 3D cache</u> for geometric consistency.

This repository contains the weights for **WorldWarp**, presented in [WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion](https://arxiv.org/abs/2512.19678).

## 🎬 Demo Video

▢️ **Click the GIF to watch the full video with sound.**

<p align="center">
  <a href="https://www.youtube.com/watch?v=rfMHxb--cKs">
    <img src="assets/web_teaser.gif" alt="WorldWarp Demo" width="100%">
  </a>
</p>

## πŸ› οΈ Installation

> ⚠️ **Hardware Note:** The current implementation requires high GPU memory (~40GB VRAM). We are currently optimizing the code to reduce this footprint.

### 🧬 Cloning the Repository
The repository contains submodules, thus please check it out with
```bash
git clone https://github.com/HyoKong/WorldWarp.git --recursive
cd WorldWarp
```

### 🐍 Create environment

Create a conda environment and install dependencies:
```
conda create -n worldwarp python=3.12 -y
conda activate worldwarp
```

### πŸ”₯ Install PyTorch
Install PyTorch with CUDA 12.6 support (or visit [PyTorch Previous Versions](https://pytorch.org/get-started/previous-versions/) for other CUDA configurations):
```bash
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
```

### πŸ“¦ Install Dependencies & Compile Extensions
These packages require compilation against the specific PyTorch version installed above.

```bash
# Core compiled dependencies
pip install flash-attn --no-build-isolation
pip install "git+https://github.com/facebookresearch/pytorch3d.git" --no-build-isolation

# Local modules
pip install src/fused-ssim/ --no-build-isolation
pip install src/simple-knn/ --no-build-isolation

# Remaining python dependencies
pip install -r requirements.txt
```



### πŸ—οΈ Build Other Extensions
```bash
cd src/ttt3r/croco/models/curope/
python setup.py build_ext --inplace
cd -  # Returns to the project root
```


## ☁️ Download checkpoints

```
mkdir ckpt
hf download Wan-AI/Wan2.1-T2V-1.3B-Diffusers --local-dir ckpt/Wan-AI/Wan2.1-T2V-1.3B-Diffusers
hf download Qwen/Qwen2.5-VL-7B-Instruct --local-dir ckpt/Qwen/Qwen2.5-VL-7B-Instruct
hf download imsuperkong/worldwarp --local-dir ckpt/

cd src/ttt3r/
gdown --fuzzy https://drive.google.com/file/d/1Asz-ZB3FfpzZYwunhQvNPZEUA8XUNAYD/view?usp=drive_link
cd ../..
```

## 🎨 GUI Demo

```bash
python gradio_demo.py
```

The web interface will open at `http://localhost:7890`. 

---

### πŸš€ Quick start:

**1️⃣ Choose Starting Image**

- **πŸ“š Examples Tab**: Click a pre-made example image (prompt auto-fills)
- **🎨 Generate Tab**: Click "Generate First Frame" from your prompt
- **πŸ“€ Upload Tab**: Upload your own image

**2️⃣ Select Camera Movement** (Recommended: πŸ“Ή From Video)

- **From Video** (Easiest and most reliable)
  - Click **"πŸ“Ή From Video"** mode
  - Select an example video from the gallery OR upload your own
  - Click **"🎯 Load Poses"** to extract camera trajectory
  - Poses are automatically cached for reuse

- **Preset Movements**
  - Select **"🎯 Preset"** mode
  - Choose movements: `DOLLY_IN`, `PAN_LEFT`, `PAN_RIGHT`, etc.
  - Can combine: e.g., `DOLLY_IN + PAN_RIGHT`

- **Custom** (Advanced)
  - Select **"πŸ”§ Custom"** mode
  - Manually control rotation and translation parameters

**3️⃣ Configure & Generate**

**Essential Parameters:**

- πŸ’ͺ **Strength (0.5 - 0.8)**
  - **Higher (0.7-0.8)**: More generated details, richer content
    - ⚠️ May introduce content changes due to higher creative freedom
  - **Lower (0.5-0.6)**: More accurate camera control, closer to input
    - ⚠️ May produce blurry results due to limited diffusion model freedom
  - **Trade-off**: Higher strength = more details but less control; Lower strength = better control but potentially blurry

- ⚑ **Speed Multiplier**
  - **Purpose**: Adjust camera movement velocity to match your scene scale
  - **Why needed**: Reference video's camera movement scale may not match your scene (e.g., drone video moving 10 meters may be too fast for a small room)
  - **< 1.0**: Slower camera movement (e.g., 0.5 = half speed)
  - **= 1.0**: Original speed from reference
  - **> 1.0**: Faster camera movement (e.g., 2.0 = double speed)
  - **Tip**: Start with 1.0, then adjust based on whether motion feels too fast or too slow

---

#### 🌟 Best Practices

- πŸ‘οΈ **Generate one chunk at a time**
  - Lets you preview each chunk's quality before continuing
  - Easier to identify issues early

- ↩️ **Use Rollback for iteration**
  - If a chunk is unsatisfactory, enter its number in **"Rollback to #"**
  - Click **"βœ‚οΈ Rollback"** to remove it
  - Adjust parameters and regenerate

- 🏎️ **Adjust Speed Multiplier per scene**
  - If camera moves too fast β†’ decrease value (e.g., 0.5-0.7)
  - If camera moves too slow β†’ increase value (e.g., 1.5-2.0)






## πŸ™Œ Acknowledgements
Our code is based on the following awesome repositories:

- [DFoT](https://github.com/kwsong0113/diffusion-forcing-transformer)
- [TTT3R](https://github.com/Inception3D/TTT3R)

We thank the authors for releasing their code!

## πŸ“– Citation

If you find our work useful, please cite:

```bibtex
@misc{kong2025worldwarp,
  title={WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion}, 
  author={Hanyang Kong and Xingyi Yang and Xiaoxu Zheng and Xinchao Wang},
  year={2025},
  eprint={2512.19678},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```