The official implementation is available on GitHub.
Zero-Shot Depth from Defocus
Yiming Zuo* Β· Hongyu Wen* Β· Venkat Subramanian* Β· Patrick Chen Β· Karhan Kayan Β· Mario Bijelic Β· Felix Heide Β· Jia Deng
(*Equal Contribution)
Princeton Vision & Learning Lab (PVL)
Paper Β· Project
Roadmap
- β³ Release FOSSA training code (Coming April 2026)
- β Release FOSSA evaluation code
- β Release ZEDD dataset and test server
Installation & Setup
Step 1: Create and activate conda environment
conda create -n fossa python=3.8
conda activate fossa
Step 2: Install Dependencies
pip install -r requirements.txt
Step 3: Build PowerExpPSF CUDA Extension
This is required for training and evaluation with synthetic defocus effects.
Build steps
cd power_exp_psf
# Build and install the extension
python setup.py build_ext --inplace
# Verify successful installation
python - <<'PY'
import torch
try:
import power_exp_psf_cuda
import os
path = power_exp_psf.__file__
if os.path.exists(path):
print(f"SUCCESS: power_exp_psf_cuda loaded from {path}")
else:
print(f"ERROR: module loaded but file does not exist at {path}")
except Exception as e:
print(f"IMPORT FAILED: {e}")
PY
cd ..
# Add power_exp_psf as a search directory for imports
export PYTHONPATH=$PWD/power_exp_psf:$PYTHONPATH
Step 4: Load datasets into dataset/datasets
Datasets download instructions
π¦ HAMMER
Download: HAMMER Dataset prepared by MoGe2.
cd dataset/datasets
wget https://huggingface.co/datasets/Ruicheng/monocular-geometry-evaluation/resolve/main/HAMMER.zip
unzip HAMMER.zip
rm -f HAMMER.zip
cd ../..
π¦ DDFF-12
Data split
cd dataset/datasets
mkdir ddff12_val_generation
cd ddff12_val_generation
mkdir third_part
Then, in your browser, navigate to the DFV Split (MS Sharepoint) prepared by DFF-DFV.
Click the download button. Then, copy the downloaded "my_ddff_trainVal.h5" file into dataset/datasets/ddff12_val_generation and rename it to "dfv_trainVal.h5".
Intrinsics matrix:
The intrinsics matrix is also provided by DFV(.mat file).
Download the "raw file" in the GitHub UI and place the downloaded IntParamLF.mat at "dataset/datasets/ddff_val_generation/third_part/".
At the end, the "dataset" directory should look like this (of which only ddff12_val_generation and HAMMER you need to create).
Expected format:
dataset/
βββ datasets/
β βββ ddff12_val_generation/
β β βββ dfv_trainVal.h5
β β βββ third_part/
β β βββ IntParamLF.mat
β βββ HAMMER/
β β βββ scene2_traj1_1/
β β β βββ 000000/
β β β β βββ depth.png
β β β β βββ intrinsics.json
β β β β βββ meta.json
β β β βββ ...
β β βββ ...
β β βββ .index.txt
β βββ splits/
β βββ infinigen_defocus/
β βββ val.json
βββ __init__.py
βββ base.py
βββ ddff12_val.py
βββ hammer.py
βββ infinigen_defocus.py
βββ uniformat.py
βββ zedd.py
Datasets that are loaded from HuggingFace (no user downloading necessary)
Note: the first time that evaluation is done on these datasets will take some time for the zip file to download and get unpacked. If you are downloading the zip file manually, note that you will have to delete the outer folder created by the unzipped file to achieve the above file structure (deleting of the outer folder is done automatically in the provided code).
Final expected format:
dataset/
βββ datasets/
β βββ ddff12_val_generation/
β β βββ dfv_trainVal.h5
β β βββ third_part/
β β βββ IntParamLF.mat
β βββ defocus_uniformat/
β β βββ diode/
β β β βββ diode_indoor_v2/
β β β β βββ 000000.npy
β β β β βββ 000001.npy
β β β β βββ ...
β β β βββ diode_outdoor_v2/
β β β βββ 000000.npy
β β β βββ 000001.npy
β β β βββ ...
β β βββ ibims/
β β βββ 000000.npy
β β βββ 000001.npy
β β βββ ...
β βββ HAMMER/
β β βββ scene2_traj1_1/
β β β βββ 000000/
β β β β βββ depth.png
β β β β βββ intrinsics.json
β β β β βββ meta.json
β β β βββ ...
β β βββ ...
β β βββ .index.txt
β βββ infinigen_defocus/
β β βββ 1a4897de_1/
β β β βββ cam_all_in_focus.npz
β β β βββ cam_ap_1.40_fd_0.80.npz
β β β βββ ...
β β β βββ depth.npy
β β β βββ image_all_in_focus.png
β β β βββ image_ap_1.40_fd_0.80.png
β β βββ ...
β βββ ZEDD/
β β βββ test/
β β β βββ test_0001/
β β β β βββ focus_stack/
β β β β β βββ img_run_1_motor_6D3E_aperture_F1.4.jpg
β β β β β βββ img_run_1_motor_6D3E_aperture_F2.0.jpg
β β β β β βββ ...
β β β β βββ gt/
β β β β βββ K.txt
β β β βββ ...
β β βββ val/
β β βββ val_0001/
β β β βββ focus_stack/
β β β β βββ img_run_1_motor_6D3E_aperture_F1.4.jpg
β β β β βββ img_run_1_motor_6D3E_aperture_F2.0.jpg
β β β β βββ ...
β β β βββ gt/
β β β βββ depth_vis.jpg
β β β βββ depth.npy
β β β βββ K.txt
β β β βββ overlay.jpg
β β βββ ...
β βββ splits/
β βββ infinigen_defocus/
β βββ val.json
βββ __init__.py
βββ base.py
βββ ddff12_val.py
βββ hammer.py
βββ infinigen_defocus.py
βββ uniformat.py
βββ zedd.py
π¦ ZEDD
Dataset: ZEDD on Hugging Face
π¦ Infinigen Defocus
Dataset: Infinigen Defocus on Hugging Face
π¦ iBims-1 and DIODE
Dataset: Preprocessed (depth holes filled) on Hugging Face
Validation Quickstart
Running Validation
The easiest way to validate is using the distributed validation script:
bash dist_val.sh --encoder [VITS/VITB] --resumed_from [NAME OF PARAMETERS] --val_loader_config_choice [VAL_CONFIG_CHOICE]
Available Validation Configurations
See config/validation_configs.py for all predefined validation setups:
Model Loading Options
Option 1: Load from HuggingFace Hub (recommended)
resumed_from='model_name' # automatically pull from venkatsubra/model_name
Option 2: Load from local path
resumed_from='/path/to/model.pth'
Reproducing Numbers in the Paper
πΉ ViT-S
Table 2
ZEDD
Note: The results below are on the validation split, so do not match the numbers in Table 2 on the test split
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice zedd_F2_8_fixed_fd_0_2_4_6_8
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4450 | 0.7866 | 0.8858 | 0.0985 |
Infinigen
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice infinigen_defocus_F1_4_fixed_fd_0_8,1_7,3_0,4_7,8_0
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.5201 | 0.8635 | 0.9400 | 0.0847 |
Table 3
iBims-1
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice ibims_F1_4_adaptive_fd
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.5193 | 0.8502 | 0.9540 | 0.0745 |
DIODE
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice diode_F1_4_adaptive_fd
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4105 | 0.6649 | 0.7661 | 0.1778 |
HAMMER
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice hammer_F1_4_adaptive_fd
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.6006 | 0.9889 | 0.9987 | 0.0440 |
Table 4
DDFF12 (Base Model)
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice ddff12_val
| MSE | RMSE | AbsRel | SqRel | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 0.0015 | 0.0352 | 0.2676 | 0.0119 | 0.3462 | 0.8119 | 0.9544 |
DDFF12 (Finetuned)
bash dist_val.sh --encoder vits --resumed_from fossa-vits-ddff-finetuned \
--val_loader_config_choice ddff12_val
| MSE | RMSE | AbsRel | SqRel | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 0.0004 | 0.0183 | 0.1076 | 0.0045 | 0.9363 | 0.9829 | 0.9908 |
πΉ ViT-B
Table 2
ZEDD
Note: The results below are on the validation split, so do not match the numbers in Table 2 on the test split
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice zedd_F2_8_fixed_fd_0_2_4_6_8
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4317 | 0.8101 | 0.9194 | 0.0957 |
Infinigen
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice infinigen_defocus_F1_4_fixed_fd_0_8,1_7,3_0,4_7,8_0
readme
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4199 | 0.8199 | 0.9355 | 0.0908 |
Table 3
iBims-1
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice ibims_F1_4_adaptive_fd
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.5548 | 0.8719 | 0.9633 | 0.0701 |
DIODE
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice diode_F1_4_adaptive_fd
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4127 | 0.6692 | 0.7786 | 0.1601 |
HAMMER
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice hammer_F1_4_adaptive_fd
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.9377 | 0.9974 | 0.9993 | 0.0172 |
Table 4
DDFF12 (Base Model)
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice ddff12_val
| MSE | RMSE | AbsRel | SqRel | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 0.0013 | 0.0324 | 0.2105 | 0.0107 | 0.6075 | 0.9206 | 0.9679 |
DDFF12 (Finetuned)
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb-ddff-finetuned \
--val_loader_config_choice ddff12_val
| MSE | RMSE | AbsRel | SqRel | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 0.0003 | 0.0148 | 0.1088 | 0.0025 | 0.9322 | 0.9866 | 0.9939 |
Submitting to ZEDD Test Server
For ZEDD test set, save model outputs in the following format:
- A single
.zipfile containing exactly 50.npyfiles at the root level (no subdirectories) - Files must be named
zedd_output_0001.npythroughzedd_output_0050.npy - Each
.npyfile must be a 2-D float array of shape (H=1216, W=1824) β no channel dimension - All values must be finite (no NaN or Inf)
Please run the following command to check the file format before submitting to the server:
python zedd_test/zedd_check_format.py --zip [YOUR_ZIP_FILE]
Here is an example to compile the zip file for FOSSA ViT-S:
bash dist_test.sh --encoder=vits --resumed_from fossa-vits --val_loader_config_choice zedd_test_F2_8_fixed_fd_0_2_4_6_8 --experiment_name=FOSSA --zedd_test_output_dir=zedd_outputs
Troubleshooting
PowerExpPSF building
β Error: nvcc not found / CUDA extension build fails
If you see an error like: "error: [Errno 2] No such file or directory: '/usr/local/cuda-12.1/bin/nvcc'" or "nvcc not found", this means your environment does not have a CUDA toolkit with nvcc available.
β Fix: Load a valid CUDA toolkit and set environment variables
On cluster environments, load an available CUDA module:
module avail cuda
module load cudatoolkit/12.6 # or closest version to your PyTorch CUDA
export CUDA_HOME=/usr/local/cuda-12.6
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
Then verify:
which nvcc
nvcc --version
Then retry:
python setup.py build_ext --inplace
β Error: ModuleNotFoundError: No module named 'power_exp_psf_cuda'
If you see an error like: "ModuleNotFoundError: No module named 'power_exp_psf_cuda'", this means your environment does not know where to search for the power_exp_psf_cuda module.
β Fix: Add the module to PYTHONPATH
From your project root, run:
export PYTHONPATH=$PWD/power_exp_psf:$PYTHONPATH
Then retry your script.
Acknowledgments
This codebase is partially based on Depth Anything v2, Video Depth Anything, DFF-DFV, and Unsupervised Depth from Focus.