kungchuking commited on
Commit
23c8f66
·
1 Parent(s): f0ba1d7

Update README

Browse files
Files changed (1) hide show
  1. README.md +27 -96
README.md CHANGED
@@ -1,29 +1,11 @@
1
- # [CVPR 2023] DynamicStereo: Consistent Dynamic Depth from Stereo Videos.
2
 
3
- **[Meta AI Research, FAIR](https://ai.facebook.com/research/)**; **[University of Oxford, VGG](https://www.robots.ox.ac.uk/~vgg/)**
4
 
5
- [Nikita Karaev](https://nikitakaraevv.github.io/), [Ignacio Rocco](https://www.irocco.info/), [Benjamin Graham](https://ai.facebook.com/people/benjamin-graham/), [Natalia Neverova](https://nneverova.github.io/), [Andrea Vedaldi](https://www.robots.ox.ac.uk/~vedaldi/), [Christian Rupprecht](https://chrirupp.github.io/)
6
-
7
- [[`Paper`](https://research.facebook.com/publications/dynamicstereo-consistent-dynamic-depth-from-stereo-videos/)] [[`Project`](https://dynamic-stereo.github.io/)] [[`BibTeX`](#citing-dynamicstereo)]
8
-
9
- ![nikita-reading](https://user-images.githubusercontent.com/37815420/236242052-e72d5605-1ab2-426c-ae8d-5c8a86d5252c.gif)
10
-
11
- **DynamicStereo** is a transformer-based architecture for temporally consistent depth estimation from stereo videos. It has been trained on a combination of two datasets: [SceneFlow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html) and **Dynamic Replica** that we present below.
12
 
13
  ## Dataset
14
-
15
- https://user-images.githubusercontent.com/37815420/236239579-7877623c-716b-4074-a14e-944d095f1419.mp4
16
-
17
- The dataset consists of 145200 *stereo* frames (524 videos) with humans and animals in motion.
18
-
19
- We provide annotations for both *left and right* views, see [this notebook](https://github.com/facebookresearch/dynamic_stereo/blob/main/notebooks/Dynamic_Replica_demo.ipynb):
20
- - camera intrinsics and extrinsics
21
- - image depth (can be converted to disparity with intrinsics)
22
- - instance segmentation masks
23
- - binary foreground / background segmentation masks
24
- - optical flow (released!)
25
- - long-range pixel trajectories (released!)
26
-
27
 
28
  ### Download the Dynamic Replica dataset
29
  Due to the enormous size of the original dataset, we created the `links_lite.json` file to enable quick testing by downloading just a small portion of the dataset.
@@ -35,14 +17,12 @@ python ./scripts/download_dynamic_replica.py --link_list_file links_lite.json --
35
  To download the full dataset, please visit [the original site](https://github.com/facebookresearch/dynamic_stereo) created by Meta.
36
 
37
  ## Installation
38
-
39
- Describes installation of DynamicStereo with the latest PyTorch3D, PyTorch 1.12.1 & cuda 11.3
40
 
41
  ### Setup the root for all source files:
42
  ```
43
- git clone https://github.com/facebookresearch/dynamic_stereo
44
  cd dynamic_stereo
45
- export PYTHONPATH=`(cd ../ && pwd)`:`pwd`:$PYTHONPATH
46
  ```
47
  ### Create a conda env:
48
  ```
@@ -51,89 +31,40 @@ conda activate dynamicstereo
51
  ```
52
  ### Install requirements
53
  ```
54
- conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
55
- # It will require some time to install PyTorch3D. In the meantime, you may want to take a break and enjoy a cup of coffee.
56
  pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
57
  pip install -r requirements.txt
58
  ```
59
 
60
- ### (Optional) Install RAFT-Stereo
61
- ```
62
- mkdir third_party
63
- cd third_party
64
- git clone https://github.com/princeton-vl/RAFT-Stereo
65
- cd RAFT-Stereo
66
- bash download_models.sh
67
- cd ../..
68
- ```
69
-
70
-
71
-
72
  ## Evaluation
73
- To download the checkpoints, you can follow the below instructions:
74
- ```
75
- mkdir checkpoints
76
- cd checkpoints
77
- wget https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_sf.pth
78
- wget https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_dr_sf.pth
79
- cd ..
80
- ```
81
- You can also download the checkpoints manually by clicking the links below. Copy the checkpoints to `./dynamic_stereo/checkpoints`.
82
 
83
- - [DynamicStereo](https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_sf.pth) trained on SceneFlow
84
- - [DynamicStereo](https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_dr_sf.pth) trained on SceneFlow and *Dynamic Replica*
85
-
86
- To evaluate DynamicStereo:
87
  ```
88
- python ./evaluation/evaluate.py --config-name eval_dynamic_replica_40_frames \
89
- MODEL.model_name=DynamicStereoModel exp_dir=./outputs/test_dynamic_replica_ds \
90
- MODEL.DynamicStereoModel.model_weights=./checkpoints/dynamic_stereo_sf.pth
91
  ```
92
- Due to the high image resolution, evaluation on *Dynamic Replica* requires a 32GB GPU. If you don't have enough GPU memory, you can decrease `kernel_size` from 20 to 10 by adding `MODEL.DynamicStereoModel.kernel_size=10` to the above python command. Another option is to decrease the dataset resolution.
93
-
94
- As a result, you should see the numbers from *Table 5* in the [paper](https://arxiv.org/pdf/2305.02296.pdf). (for this, you need `kernel_size=20`)
95
 
96
- Reconstructions of all the *Dynamic Replica* splits (including *real*) will be visualized and saved to `exp_dir`.
97
-
98
- If you installed [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo), you can run:
99
- ```
100
- python ./evaluation/evaluate.py --config-name eval_dynamic_replica_40_frames \
101
- MODEL.model_name=RAFTStereoModel exp_dir=./outputs/test_dynamic_replica_raft
102
- ```
103
-
104
- Other public datasets we use:
105
- - [SceneFlow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)
106
- - [Sintel](http://sintel.is.tue.mpg.de/stereo)
107
- - [Middlebury](https://vision.middlebury.edu/stereo/data/)
108
- - [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-training-data)
109
- - [KITTI 2015](http://www.cvlibs.net/datasets/kitti/eval_stereo.php)
110
 
111
  ## Training
112
- Training requires a 32GB GPU. You can decrease `image_size` and / or `sample_len` if you don't have enough GPU memory.
113
- You need to donwload SceneFlow before training. Alternatively, you can only train on *Dynamic Replica*.
 
 
 
 
 
 
114
  ```
115
- python train.py --batch_size 1 \
116
- --spatial_scale -0.2 0.4 --image_size 384 512 --saturation_range 0 1.4 --num_steps 200000 \
117
- --ckpt_path dynamicstereo_sf_dr \
118
- --sample_len 5 --lr 0.0003 --train_iters 10 --valid_iters 20 \
119
- --num_workers 28 --save_freq 100 --update_block_3d --different_update_blocks \
120
- --attention_type self_stereo_temporal_update_time_update_space --train_datasets dynamic_replica things monkaa driving
121
  ```
122
- If you want to train on SceneFlow only, remove the flag `dynamic_replica` from `train_datasets`.
123
-
124
-
125
 
126
  ## License
127
- The majority of dynamic_stereo is licensed under CC-BY-NC, however portions of the project are available under separate license terms: [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo) is licensed under the MIT license, [LoFTR](https://github.com/zju3dv/LoFTR) and [CREStereo](https://github.com/megvii-research/CREStereo) are licensed under the Apache 2.0 license.
128
 
129
-
130
- ## Citing DynamicStereo
131
- If you use DynamicStereo or Dynamic Replica in your research, please use the following BibTeX entry.
132
- ```
133
- @article{karaev2023dynamicstereo,
134
- title={DynamicStereo: Consistent Dynamic Depth from Stereo Videos},
135
- author={Nikita Karaev and Ignacio Rocco and Benjamin Graham and Natalia Neverova and Andrea Vedaldi and Christian Rupprecht},
136
- journal={CVPR},
137
- year={2023}
138
- }
139
- ```
 
1
+ # [ECE1508 Final Project] Joint Learning of Exposure Patterns and Stereo Depth from Coded Snapshots
2
 
3
+ ![TeddyBear](https://github.com/kungchuking/E2E_SCSI/blob/master/images/overview.gif)
4
 
5
+ This project introduces a novel, end-to-end learning approach that jointly addresses two traditionally separate computer vision challenges: Snapshot Compressed Image (SCI) decoding and dynamic stereo depth estimation. The framework is an adaptation of the [DynamicStereo](https://github.com/facebookresearch/dynamic_stereo) repository and was trained using the [DynamicReplica](https://github.com/facebookresearch/dynamic_stereo) dataset.
 
 
 
 
 
 
6
 
7
  ## Dataset
8
+ The [DynamicReplica](https://github.com/facebookresearch/dynamic_stereo) dataset consists of 145200 *stereo* frames (524 videos) with humans and animals in motion.
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  ### Download the Dynamic Replica dataset
11
  Due to the enormous size of the original dataset, we created the `links_lite.json` file to enable quick testing by downloading just a small portion of the dataset.
 
17
  To download the full dataset, please visit [the original site](https://github.com/facebookresearch/dynamic_stereo) created by Meta.
18
 
19
  ## Installation
20
+ To set up and run the project, please follow these steps.
 
21
 
22
  ### Setup the root for all source files:
23
  ```
24
+ git clone https://github.com/kungchuking/E2E_SCSI.git
25
  cd dynamic_stereo
 
26
  ```
27
  ### Create a conda env:
28
  ```
 
31
  ```
32
  ### Install requirements
33
  ```
34
+ pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
 
35
  pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
36
  pip install -r requirements.txt
37
  ```
38
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ## Evaluation
40
+ To download the pre-trained model weights (checkpoints), please follow the instructions below.
 
 
 
 
 
 
 
 
41
 
42
+ ### Command Line Download
43
+ You can use the following commands to create the required directory and download the primary checkpoint directly from the Hugging Face repository:
 
 
44
  ```
45
+ mkdir dynamicstereo_sf_dr
46
+ wget -O dynamicstereo_sf_dr/model_dynamic-stereo_030537.pth "https://huggingface.co/kungchuking/E2E_SCSI/resolve/main/dynamicstereo_sf_dr/model_dynamic-stereo_030537.pth"
 
47
  ```
48
+ ### Manual Download
49
+ Alternatively, you can manually download the checkpoints by clicking the [link](https://huggingface.co/kungchuking/E2E_SCSI/resolve/main/dynamicstereo_sf_dr/model_dynamic-stereo_030537.pth). Ensure the downloaded file is placed in the required path: `./dynamicstereo_sf_dr/`.
 
50
 
51
+ ### Evaluation Notebook
52
+ For detailed instructions on how to evaluate the model, please refer to the dedicated [evaluation notebook](https://github.com/kungchuking/E2E_SCSI/blob/master/notebooks/evaluate.ipynb).
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  ## Training
55
+ ### Hardware and Memory Requirements
56
+ Training the model requires a minimum of a 50GB GPU.
57
+ * **Memory Adjustment**: If your GPU memory is limited, you may decrease the `image_size` and/or the `sample_len` parameters.
58
+ * **Resolution Note**: The chosen `image_size` of 480x640 corresponds to the native resolution of the custom-designed coded-exposure camera used for our research.
59
+ * **Compression Impact**: Reducing the `sample_length` will inherently decrease the effective compression ratio for the Snapshot Compressed Imaging (SCI) process.
60
+ Before starting training, you must download the Dynamic Replica dataset.
61
+ ### Execution
62
+ If you are running on a Linux machine, use the provided shell script for training:
63
  ```
64
+ ./train.csh
 
 
 
 
 
65
  ```
66
+ For other operating systems, you can open the `./train.csh` file and manually copy and execute the instruction.
 
 
67
 
68
  ## License
69
+ Portions of the project are available under separate license terms: [DynamicStereo](https://github.com/facebookresearch/dynamic_stereo) is licensed under CC-BY-NC, [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo) is licensed under the MIT license, [LoFTR](https://github.com/zju3dv/LoFTR) and [CREStereo](https://github.com/megvii-research/CREStereo) are licensed under the Apache 2.0 license.
70