Spaces:
Running
on
Zero
Running
on
Zero
| # ๐น RollingDepth: Video Depth without Video Models | |
| [](https://rollingdepth.github.io) | |
| [](https://huggingface.co/prs-eth/rollingdepth-v1-0) | |
| <!-- []() --> | |
| This repository represents the official implementation of the paper titled "Video Depth without Video Models". | |
| [Bingxin Ke](http://www.kebingxin.com/)<sup>1</sup>, | |
| [Dominik Narnhofer](https://scholar.google.com/citations?user=tFx8AhkAAAAJ&hl=en)<sup>1</sup>, | |
| [Shengyu Huang](https://shengyuh.github.io/)<sup>1</sup>, | |
| [Lei Ke](https://www.kelei.site/)<sup>2</sup>, | |
| [Torben Peters](https://scholar.google.com/citations?user=F2C3I9EAAAAJ&hl=de)<sup>1</sup>, | |
| [Katerina Fragkiadaki](https://www.cs.cmu.edu/~katef/)<sup>2</sup>, | |
| [Anton Obukhov](https://www.obukhov.ai/)<sup>1</sup>, | |
| [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en)<sup>1</sup> | |
| <sup>1</sup>ETH Zurich, | |
| <sup>2</sup>Carnegie Mellon University | |
| ## ๐ข News | |
| 2024-11-28: Inference code is released.<br> | |
| ## ๐ ๏ธ Setup | |
| The inference code was tested on: Debian 12, Python 3.12.7 (venv), CUDA 12.4, GeForce RTX 3090 | |
| ### ๐ฆ Repository | |
| ```bash | |
| git clone https://github.com/prs-eth/RollingDepth.git | |
| cd RollingDepth | |
| ``` | |
| ### ๐ Python environment | |
| Create python environment: | |
| ```bash | |
| # with venv | |
| python -m venv venv/rollingdepth | |
| source venv/rollingdepth/bin/activate | |
| # or with conda | |
| conda create --name rollingdepth python=3.12 | |
| conda activate rollingdepth | |
| ``` | |
| ### ๐ป Dependencies | |
| Install dependicies: | |
| ```bash | |
| pip install -r requirements.txt | |
| # Install modified diffusers with cross-frame self-attention | |
| bash script/install_diffusers_dev.sh | |
| ``` | |
| We use [pyav](https://github.com/PyAV-Org/PyAV) for video I/O, which relies on [ffmpeg](https://www.ffmpeg.org/). | |
| ## ๐ Test on your videos | |
| All scripts are designed to run from the project root directory. | |
| ### ๐ท Prepare input videos | |
| 1. Use sample videos: | |
| ```bash | |
| bash script/download_sample_data.sh | |
| ``` | |
| 1. Or place your videos in a directory, for example, under `data/samples`. | |
| ### ๐ Run with presets | |
| ```bash | |
| python run_video.py \ | |
| -i data/samples \ | |
| -o output/samples_fast \ | |
| -p fast \ | |
| --save-npy true \ | |
| --verbose | |
| ``` | |
| - `-p` or `--preset`: preset options | |
| - `fast` for **fast inference**, with dilations [1, 25] (flexible), fp16, without refinement, at max. resolution 768. | |
| - `fast1024` for **fast inference at resolution 1024** | |
| - `full` for **better details**, with dilations [1, 10, 25] (flexible), fp16, with 10 refinement steps, at max. resolution 1024. | |
| - `paper` for **reproducing paper numbers**, with (fixed) dilations [1, 10, 25], fp32, with 10 refinement steps, at max. resolution 768. | |
| - `-i` or `--input-video`: path to input data, can be a single video file, a text file with video paths, or a directory of videos. | |
| - `-o` or `--output-dir`: output directory. | |
| Passing other arguments below may overwrite the preset settings: | |
| - Coming soon | |
| <!-- TODO: explain all arguments in detailed --> | |
| ## โฌ Checkpoint cache | |
| By default, the [checkpoint](https://huggingface.co/prs-eth/rollingdepth-v1-0) is stored in the Hugging Face cache. The HF_HOME environment variable defines its location and can be overridden, e.g.: | |
| ``` | |
| export HF_HOME=$(pwd)/cache | |
| ``` | |
| Alternatively, use the following script to download the checkpoint weights locally and specify checkpoint path by `-c checkpoint/rollingdepth-v1-0 ` | |
| ```bash | |
| bash script/download_weight.sh | |
| ``` | |
| ## ๐ฆฟ Evaluation on test datasets | |
| Coming soon | |
| <!-- ## ๐ Citation | |
| TODO --> | |
| ## ๐ Acknowledgments | |
| We thank Yue Pan, Shuchang Liu, Nando Metzger, and Nikolai Kalischek for fruitful discussions. | |
| We are grateful to [redmond.ai](https://redmond.ai/) (robin@redmond.ai) for providing GPU resources. | |
| ## ๐ซ License | |
| This code of this work is licensed under the Apache License, Version 2.0 (as defined in the [LICENSE](LICENSE.txt)). | |
| The model is licensed under RAIL++-M License (as defined in the [LICENSE-MODEL](LICENSE-MODEL.txt)) | |
| By downloading and using the code and model you agree to the terms in [LICENSE](LICENSE.txt) and [LICENSE-MODEL](LICENSE-MODEL.txt) respectively. | |