Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,173 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
|
| 2 |
+
|
| 3 |
+
<div align="center" style="margin-top: 0px; margin-bottom: 0px;">
|
| 4 |
+
<img src=asset/StereoPilot_logo.png width="30%"/>
|
| 5 |
+
</div>
|
| 6 |
+
|
| 7 |
+
<div align="center">
|
| 8 |
+
|
| 9 |
+
### [[Project Page]](https://hit-perfect.github.io/StereoPilot/) [arXiv] [Dataset]
|
| 10 |
+
|
| 11 |
+
_**[Guibao Shen](https://a-bigbao.github.io)<sup>1,3*†</sup>, [Yihua Du](https://hit-perfect.github.io)<sup>1*</sup>, [Wenhang Ge](https://g3956.github.io/wenhangge.github.io/)<sup>1,3*†</sup>, [Jing He](https://jingheya.github.io)<sup>1</sup>, [Chirui Chang](https://hit-perfect.github.io/StereoPilot/)<sup>3</sup>, [Donghao Zhou](https://correr-zhou.github.io/)<sup>4</sup>, [Zhen Yang](https://zhenyangcs.github.io/)<sup>1</sup>, [Luozhou Wang](https://wileewang.github.io)<sup>1</sup>, [Xin Tao](https://www.xtao.website)<sup>3</sup>, [Ying-Cong Chen](https://www.yingcong.me)<sup>1,2‡</sup>**_
|
| 12 |
+
|
| 13 |
+
<sup>1</sup>HKUST(GZ), <sup>2</sup>HKUST, <sup>3</sup>Kling Team, Kuaishou Technology, <sup>4</sup>CUHK
|
| 14 |
+
|
| 15 |
+
(*Equal contribution, †This work was conducted during the author's internship at Kling, ‡Corresponding author)
|
| 16 |
+
|
| 17 |
+
</div>
|
| 18 |
+
|
| 19 |
+
## 📖 Introduction
|
| 20 |
+
|
| 21 |
+
**TL;DR:** We propose **StereoPilot**, an efficient feed-forward architecture that leverages pretrained video diffusion transformers to directly synthesize novel views, overcoming the limitations of *Depth-Warp-Inpaint* methods without iterative denoising. With a domain switcher and cycle consistency loss, it enables robust multi-format stereo conversion. We also introduce **UniStereo**, the first large-scale unified dataset featuring both parallel and converged stereo formats.
|
| 22 |
+
|
| 23 |
+
<div align="center">
|
| 24 |
+
|
| 25 |
+
[](https://www.youtube.com/watch?v=P14q02ajKT0)
|
| 26 |
+
|
| 27 |
+
**🎬 Click the image to view our showcase video**
|
| 28 |
+
|
| 29 |
+
</div>
|
| 30 |
+
|
| 31 |
+
## 🔥 Updates
|
| 32 |
+
|
| 33 |
+
- __[2025.12.16]__: Release inference code and [Project Page](https://hit-perfect.github.io/StereoPilot/) (Hope you like it).
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
## ⚙️ Requirements
|
| 37 |
+
|
| 38 |
+
Our inference environment:
|
| 39 |
+
- Python 3.12
|
| 40 |
+
- CUDA 12.1
|
| 41 |
+
- PyTorch 2.4.1
|
| 42 |
+
- GPU: NVIDIA A800 (only ~23GB VRAM required)
|
| 43 |
+
|
| 44 |
+
## 🛠️ Installation
|
| 45 |
+
|
| 46 |
+
**Step 1:** Clone the repository
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
git clone <repository-url>
|
| 50 |
+
|
| 51 |
+
cd StereoPilot
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
**Step 2:** Create conda environment
|
| 55 |
+
|
| 56 |
+
```bash
|
| 57 |
+
conda create -n StereoPilot python=3.12
|
| 58 |
+
|
| 59 |
+
conda activate StereoPilot
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
**Step 3:** Install dependencies
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
pip install -r requirements.txt
|
| 66 |
+
|
| 67 |
+
pip install flash-attn==2.7.4.post1 --no-build-isolation
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
**Step 4:** Download model checkpoints
|
| 71 |
+
|
| 72 |
+
Place the following files in the `ckpt/` directory:
|
| 73 |
+
|
| 74 |
+
| File | Description |
|
| 75 |
+
|------|-------------|
|
| 76 |
+
| [`StereoPilot.safetensors`](https://huggingface.co/KlingTeam/StereoPilot) | StereoPilot model weights |
|
| 77 |
+
| [`Wan2.1-T2V-1.3B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) | Base Wan2.1 model directory |
|
| 78 |
+
|
| 79 |
+
Download StereoPilot.safetensor & Wan2.1-1.3B base model:
|
| 80 |
+
|
| 81 |
+
```bash
|
| 82 |
+
pip install "huggingface_hub[cli]"
|
| 83 |
+
|
| 84 |
+
huggingface-cli download KlingTeam/StereoPilot --local-dir ./ckpt/StereoPilot.safetensors
|
| 85 |
+
|
| 86 |
+
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./ckpt/Wan2.1-T2V-1.3B
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
## 🚀 Inference
|
| 90 |
+
|
| 91 |
+
### Input Requirements
|
| 92 |
+
|
| 93 |
+
For each input video, you need:
|
| 94 |
+
1. **Video file** (`.mp4`): Monocular video, 81 frames, 832×480 resolution, 16fps
|
| 95 |
+
2. **Prompt file** (`.txt`): Text description of the video content (same name as video)
|
| 96 |
+
|
| 97 |
+
Example (you can try the cases in the `sample/` folder):
|
| 98 |
+
```
|
| 99 |
+
sample/
|
| 100 |
+
├── my_video.mp4
|
| 101 |
+
└── my_video.txt
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
### Running Inference
|
| 105 |
+
|
| 106 |
+
**Basic usage:**
|
| 107 |
+
|
| 108 |
+
```bash
|
| 109 |
+
# Edit toml/infer.toml to customize model paths. If you followed the above steps, there is no need to change
|
| 110 |
+
python sample.py \
|
| 111 |
+
--config toml/infer.toml \
|
| 112 |
+
--input /path/to/input_video.mp4 \
|
| 113 |
+
--output_folder /path/to/output \
|
| 114 |
+
--device cuda:0
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
**Using the example script:**
|
| 118 |
+
|
| 119 |
+
```bash
|
| 120 |
+
bash sample.sh
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
### Generate Stereo Visualization
|
| 124 |
+
|
| 125 |
+
After inference, you can generate Side-by-Side (SBS) and Red-Cyan anaglyph stereo videos for visualization:
|
| 126 |
+
|
| 127 |
+
```bash
|
| 128 |
+
python utils/stereo_video.py \
|
| 129 |
+
--left /path/to/left_eye.mp4 \
|
| 130 |
+
--right /path/to/right_eye.mp4 \
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
**Output files:**
|
| 134 |
+
| Output | Description | Viewing Device |
|
| 135 |
+
|--------|-------------|----------------|
|
| 136 |
+
| `{name}_sbs.mp4` | Side-by-Side stereo video | VR Headset <img src="asset/VR_Glass.png" width="24" height="24"> |
|
| 137 |
+
| `{name}_anaglyph.mp4` | Red-Cyan anaglyph stereo video | 3D Glasses <img src="asset/Red_Blue_Glass.png" width="24" height="24"> |
|
| 138 |
+
|
| 139 |
+
## 📊 Dataset
|
| 140 |
+
|
| 141 |
+
We introduce **UniStereo**, the first large-scale unified stereo video dataset featuring both parallel and converged stereo formats.
|
| 142 |
+
|
| 143 |
+
<div align="center">
|
| 144 |
+
<img src="asset/parallel_vs_converged.png" width="80%">
|
| 145 |
+
</div>
|
| 146 |
+
|
| 147 |
+
UniStereo consists of two parts:
|
| 148 |
+
- **3DMovie** - Converged stereo format from 3D movies
|
| 149 |
+
- **Stereo4D** - Parallel stereo format *(coming soon)*
|
| 150 |
+
|
| 151 |
+
For detailed data processing instructions, please refer to [StereoPilot_Dataprocess](./StereoPilot_Dataprocess/).
|
| 152 |
+
|
| 153 |
+
## 📄 License
|
| 154 |
+
|
| 155 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
| 156 |
+
|
| 157 |
+
## 🙏 Acknowledgments
|
| 158 |
+
|
| 159 |
+
- [Wan2.1](https://github.com/Wan-Video/Wan2.1) - Base video generation model
|
| 160 |
+
|
| 161 |
+
## 🌟 Citation
|
| 162 |
+
|
| 163 |
+
If you find our work helpful, please consider citing:
|
| 164 |
+
|
| 165 |
+
```bibtex
|
| 166 |
+
@article{shen2025stereopilot,
|
| 167 |
+
title={StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors},
|
| 168 |
+
author={Shen, Guibao and Du, Yihua and Ge, Wenhang and He, Jing and Chang, Chirui and Zhou, Donghao and Yang, Zhen and Wang, Luozhou and Tao, Xin and Chen, Ying-Cong},
|
| 169 |
+
journal={arXiv preprint},
|
| 170 |
+
year={2025}
|
| 171 |
+
}
|
| 172 |
+
```
|
| 173 |
+
|