E2E_SCSI / README.md
kungchuking's picture
Update README
4ba91b5
---
language:
- en
pipeline_tag: depth-estimation
---
# [ECE1508 Final Project] Joint Learning of Exposure Patterns and Stereo Depth from Coded Snapshots
![Overview](https://github.com/kungchuking/E2E_SCSI/raw/master/images/overview.gif)
This project introduces a novel, end-to-end learning approach that jointly addresses two traditionally separate computer vision challenges: Snapshot Compressed Image (SCI) decoding and dynamic stereo depth estimation. The framework is an adaptation of the [DynamicStereo](https://github.com/facebookresearch/dynamic_stereo) repository and was trained using the [DynamicReplica](https://github.com/facebookresearch/dynamic_stereo) dataset.
## Dataset
The [DynamicReplica](https://github.com/facebookresearch/dynamic_stereo) dataset consists of 145200 *stereo* frames (524 videos) with humans and animals in motion.
### Download the Dynamic Replica dataset
Due to the enormous size of the original dataset, we created the `links_lite.json` file to enable quick testing by downloading just a small portion of the dataset.
```
python ./scripts/download_dynamic_replica.py --link_list_file links_lite.json --download_folder ./dynamic_replica_data --download_splits test train valid real
```
To download the full dataset, please visit [the original site](https://github.com/facebookresearch/dynamic_stereo) created by Meta.
## Installation
To set up and run the project, please follow these steps.
### Setup the root for all source files:
```
git clone https://github.com/kungchuking/E2E_SCSI.git
cd dynamic_stereo
```
### Create a conda env:
```
conda create -n dynamicstereo python=3.8
conda activate dynamicstereo
```
### Install requirements
```
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install -r requirements.txt
```
## Evaluation
To download the pre-trained model weights (checkpoints), please follow the instructions below.
### Command Line Download
You can use the following commands to create the required directory and download the primary checkpoint directly from the Hugging Face repository:
```
mkdir dynamicstereo_sf_dr
wget -O dynamicstereo_sf_dr/model_dynamic-stereo_050895.pth "https://huggingface.co/kungchuking/E2E_SCSI/resolve/main/dynamicstereo_sf_dr/model_dynamic-stereo_050895.pth"
```
### Manual Download
Alternatively, you can manually download the checkpoints by clicking the [link](https://huggingface.co/kungchuking/E2E_SCSI/resolve/main/dynamicstereo_sf_dr/model_dynamic-stereo_050895.pth). Ensure the downloaded file is placed in the required path: `./dynamicstereo_sf_dr/`.
### Evaluation Notebook
For detailed instructions on how to evaluate the model, please refer to the dedicated [evaluation notebook](https://huggingface.co/kungchuking/E2E_SCSI/blob/main/notebooks/evaluate.ipynb).
### Evaluation and Validation
To execute the final evaluation on the DynamicReplica test set, navigate to the `evaluation`directory and run the following Python script:
```
cd evaluation
python evaluate.py
```
## Training
### Hardware and Memory Requirements
Training the model requires a minimum of a 50GB GPU.
* **Memory Adjustment**: If your GPU memory is limited, you may decrease the `image_size` and/or the `sample_len` parameters.
* **Resolution Note**: The chosen `image_size` of 480x640 corresponds to the native resolution of the custom-designed coded-exposure camera used for our research.
* **Compression Impact**: Reducing the `sample_length` will inherently decrease the effective compression ratio for the Snapshot Compressed Imaging (SCI) process.
Before starting training, you must download the Dynamic Replica dataset.
### Execution
If you are running on a Linux machine, use the provided shell script for training:
```
./train.csh
```
For other operating systems, you can open the `./train.csh` file and manually copy and execute the instruction.
## License
Portions of the project are available under separate license terms: [DynamicStereo](https://github.com/facebookresearch/dynamic_stereo) is licensed under CC-BY-NC, [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo) is licensed under the MIT license, [LoFTR](https://github.com/zju3dv/LoFTR) and [CREStereo](https://github.com/megvii-research/CREStereo) are licensed under the Apache 2.0 license.