kungchuking
/

E2E_SCSI

Depth Estimation

English

Model card Files Files and versions

xet

Community

kungchuking commited on Dec 2, 2025

Commit

23c8f66

1 Parent(s): f0ba1d7

Update README

Browse files

Files changed (1) hide show

README.md +27 -96

README.md CHANGED Viewed

@@ -1,29 +1,11 @@
-# [CVPR 2023] DynamicStereo: Consistent Dynamic Depth from Stereo Videos.
-**[Meta AI Research, FAIR](https://ai.facebook.com/research/)**; **[University of Oxford, VGG](https://www.robots.ox.ac.uk/~vgg/)**
-[Nikita Karaev](https://nikitakaraevv.github.io/), [Ignacio Rocco](https://www.irocco.info/), [Benjamin Graham](https://ai.facebook.com/people/benjamin-graham/), [Natalia Neverova](https://nneverova.github.io/), [Andrea Vedaldi](https://www.robots.ox.ac.uk/~vedaldi/), [Christian Rupprecht](https://chrirupp.github.io/)
-[[`Paper`](https://research.facebook.com/publications/dynamicstereo-consistent-dynamic-depth-from-stereo-videos/)] [[`Project`](https://dynamic-stereo.github.io/)] [[`BibTeX`](#citing-dynamicstereo)]
-![nikita-reading](https://user-images.githubusercontent.com/37815420/236242052-e72d5605-1ab2-426c-ae8d-5c8a86d5252c.gif)
-**DynamicStereo** is a transformer-based architecture for temporally consistent depth estimation from stereo videos. It has been trained on a combination of two datasets: [SceneFlow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html) and **Dynamic Replica** that we present below.
 ## Dataset
-https://user-images.githubusercontent.com/37815420/236239579-7877623c-716b-4074-a14e-944d095f1419.mp4
-The dataset consists of 145200 *stereo* frames (524 videos) with humans and animals in motion.
-We provide annotations for both *left and right* views, see [this notebook](https://github.com/facebookresearch/dynamic_stereo/blob/main/notebooks/Dynamic_Replica_demo.ipynb):
-- camera intrinsics and extrinsics
-- image depth (can be converted to disparity with intrinsics)
-- instance segmentation masks
-- binary foreground / background segmentation masks
-- optical flow (released!)
-- long-range pixel trajectories (released!)
 ### Download the Dynamic Replica dataset
 Due to the enormous size of the original dataset, we created the `links_lite.json` file to enable quick testing by downloading just a small portion of the dataset.
@@ -35,14 +17,12 @@ python ./scripts/download_dynamic_replica.py --link_list_file links_lite.json --
 To download the full dataset, please visit [the original site](https://github.com/facebookresearch/dynamic_stereo) created by Meta.
 ## Installation
-Describes installation of DynamicStereo with the latest PyTorch3D, PyTorch 1.12.1 & cuda 11.3
 ### Setup the root for all source files:
 ```
-git clone https://github.com/facebookresearch/dynamic_stereo
 cd dynamic_stereo
-export PYTHONPATH=`(cd ../ && pwd)`:`pwd`:$PYTHONPATH
 ```
 ### Create a conda env:
 ```
@@ -51,89 +31,40 @@ conda activate dynamicstereo
 ```
 ### Install requirements
 ```
-conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
-# It will require some time to install PyTorch3D. In the meantime, you may want to take a break and enjoy a cup of coffee.
 pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
 pip install -r requirements.txt
 ```
-### (Optional) Install RAFT-Stereo
-```
-mkdir third_party
-cd third_party
-git clone https://github.com/princeton-vl/RAFT-Stereo
-cd RAFT-Stereo
-bash download_models.sh
-cd ../..
-```
 ## Evaluation
-To download the checkpoints, you can follow the below instructions:
-```
-mkdir checkpoints
-cd checkpoints
-wget https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_sf.pth
-wget https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_dr_sf.pth
-cd ..
-```
-You can also download the checkpoints manually by clicking the links below. Copy the checkpoints to `./dynamic_stereo/checkpoints`.
-- [DynamicStereo](https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_sf.pth) trained on SceneFlow
-- [DynamicStereo](https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_dr_sf.pth) trained on SceneFlow and *Dynamic Replica*
-To evaluate DynamicStereo:
 ```
-python ./evaluation/evaluate.py --config-name eval_dynamic_replica_40_frames \
- MODEL.model_name=DynamicStereoModel exp_dir=./outputs/test_dynamic_replica_ds \
- MODEL.DynamicStereoModel.model_weights=./checkpoints/dynamic_stereo_sf.pth
 ```
-Due to the high image resolution, evaluation on *Dynamic Replica* requires a 32GB GPU. If you don't have enough GPU memory, you can decrease `kernel_size` from 20 to 10 by adding `MODEL.DynamicStereoModel.kernel_size=10` to the above python command. Another option is to decrease the dataset resolution.
-As a result, you should see the numbers from *Table 5* in the [paper](https://arxiv.org/pdf/2305.02296.pdf). (for this, you need `kernel_size=20`)
-Reconstructions of all the *Dynamic Replica* splits (including *real*) will be visualized and saved to `exp_dir`.
-If you installed [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo), you can run:
-```
-python ./evaluation/evaluate.py --config-name eval_dynamic_replica_40_frames \
-  MODEL.model_name=RAFTStereoModel exp_dir=./outputs/test_dynamic_replica_raft
-```
-Other public datasets we use:
- - [SceneFlow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)
- - [Sintel](http://sintel.is.tue.mpg.de/stereo)
- - [Middlebury](https://vision.middlebury.edu/stereo/data/)
- - [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-training-data)
- - [KITTI 2015](http://www.cvlibs.net/datasets/kitti/eval_stereo.php)
 ## Training
-Training requires a 32GB GPU. You can decrease `image_size` and / or `sample_len` if you don't have enough GPU memory.
-You need to donwload SceneFlow before training. Alternatively, you can only train on *Dynamic Replica*.
 ```
-python train.py --batch_size 1 \
- --spatial_scale -0.2 0.4 --image_size 384 512 --saturation_range 0 1.4 --num_steps 200000  \
- --ckpt_path dynamicstereo_sf_dr  \
-  --sample_len 5 --lr 0.0003 --train_iters 10 --valid_iters 20    \
-  --num_workers 28 --save_freq 100  --update_block_3d --different_update_blocks \
-  --attention_type self_stereo_temporal_update_time_update_space --train_datasets dynamic_replica things monkaa driving
 ```
-If you want to train on SceneFlow only, remove the flag `dynamic_replica` from `train_datasets`.
 ## License
-The majority of dynamic_stereo is licensed under CC-BY-NC, however portions of the project are available under separate license terms: [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo) is licensed under the MIT license, [LoFTR](https://github.com/zju3dv/LoFTR) and [CREStereo](https://github.com/megvii-research/CREStereo) are licensed under the Apache 2.0 license.
-## Citing DynamicStereo
-If you use DynamicStereo or Dynamic Replica in your research, please use the following BibTeX entry.
-```
-@article{karaev2023dynamicstereo,
-  title={DynamicStereo: Consistent Dynamic Depth from Stereo Videos},
-  author={Nikita Karaev and Ignacio Rocco and Benjamin Graham and Natalia Neverova and Andrea Vedaldi and Christian Rupprecht},
-  journal={CVPR},
-  year={2023}
-}
-```

+# [ECE1508 Final Project] Joint Learning of Exposure Patterns and Stereo Depth from Coded Snapshots
+![TeddyBear](https://github.com/kungchuking/E2E_SCSI/blob/master/images/overview.gif)
+This project introduces a novel, end-to-end learning approach that jointly addresses two traditionally separate computer vision challenges: Snapshot Compressed Image (SCI) decoding and dynamic stereo depth estimation. The framework is an adaptation of the [DynamicStereo](https://github.com/facebookresearch/dynamic_stereo) repository and was trained using the [DynamicReplica](https://github.com/facebookresearch/dynamic_stereo) dataset.
 ## Dataset
+The [DynamicReplica](https://github.com/facebookresearch/dynamic_stereo) dataset consists of 145200 *stereo* frames (524 videos) with humans and animals in motion.
 ### Download the Dynamic Replica dataset
 Due to the enormous size of the original dataset, we created the `links_lite.json` file to enable quick testing by downloading just a small portion of the dataset.
 To download the full dataset, please visit [the original site](https://github.com/facebookresearch/dynamic_stereo) created by Meta.
 ## Installation
+To set up and run the project, please follow these steps.
 ### Setup the root for all source files:
 ```
+git clone https://github.com/kungchuking/E2E_SCSI.git
 cd dynamic_stereo
 ```
 ### Create a conda env:
 ```
 ```
 ### Install requirements
 ```
+pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
 pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
 pip install -r requirements.txt
 ```
 ## Evaluation
+To download the pre-trained model weights (checkpoints), please follow the instructions below.
+### Command Line Download
+You can use the following commands to create the required directory and download the primary checkpoint directly from the Hugging Face repository:
 ```
+mkdir dynamicstereo_sf_dr
+wget -O dynamicstereo_sf_dr/model_dynamic-stereo_030537.pth "https://huggingface.co/kungchuking/E2E_SCSI/resolve/main/dynamicstereo_sf_dr/model_dynamic-stereo_030537.pth"
 ```
+### Manual Download
+Alternatively, you can manually download the checkpoints by clicking the [link](https://huggingface.co/kungchuking/E2E_SCSI/resolve/main/dynamicstereo_sf_dr/model_dynamic-stereo_030537.pth). Ensure the downloaded file is placed in the required path: `./dynamicstereo_sf_dr/`.
+### Evaluation Notebook
+For detailed instructions on how to evaluate the model, please refer to the dedicated [evaluation notebook](https://github.com/kungchuking/E2E_SCSI/blob/master/notebooks/evaluate.ipynb).
 ## Training
+### Hardware and Memory Requirements
+Training the model requires a minimum of a 50GB GPU.
+* **Memory Adjustment**: If your GPU memory is limited, you may decrease the `image_size` and/or the `sample_len` parameters.
+* **Resolution Note**: The chosen `image_size` of 480x640 corresponds to the native resolution of the custom-designed coded-exposure camera used for our research.
+* **Compression Impact**: Reducing the `sample_length` will inherently decrease the effective compression ratio for the Snapshot Compressed Imaging (SCI) process.
+Before starting training, you must download the Dynamic Replica dataset.
+### Execution
+If you are running on a Linux machine, use the provided shell script for training:
 ```
+./train.csh
 ```
+For other operating systems, you can open the `./train.csh` file and manually copy and execute the instruction.
 ## License
+Portions of the project are available under separate license terms: [DynamicStereo](https://github.com/facebookresearch/dynamic_stereo) is licensed under CC-BY-NC, [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo) is licensed under the MIT license, [LoFTR](https://github.com/zju3dv/LoFTR) and [CREStereo](https://github.com/megvii-research/CREStereo) are licensed under the Apache 2.0 license.