The official implementation is available on GitHub.

Zero-Shot Depth from Defocus

Yiming Zuo* Β· Hongyu Wen* Β· Venkat Subramanian* Β· Patrick Chen Β· Karhan Kayan Β· Mario Bijelic Β· Felix Heide Β· Jia Deng

(*Equal Contribution)

Princeton Vision & Learning Lab (PVL)

Paper Β· Project

FOSSA Teaser


Roadmap

  • ⏳ Release FOSSA training code (Coming April 2026)
  • βœ… Release FOSSA evaluation code
  • βœ… Release ZEDD dataset and test server

Installation & Setup

Step 1: Create and activate conda environment

conda create -n fossa python=3.8
conda activate fossa

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Build PowerExpPSF CUDA Extension

This is required for training and evaluation with synthetic defocus effects.

Build steps
cd power_exp_psf

# Build and install the extension
python setup.py build_ext --inplace

# Verify successful installation
python - <<'PY'
import torch
try:
    import power_exp_psf_cuda
    import os
    path = power_exp_psf.__file__
    if os.path.exists(path):
        print(f"SUCCESS: power_exp_psf_cuda loaded from {path}")
    else:
        print(f"ERROR: module loaded but file does not exist at {path}")
except Exception as e:
    print(f"IMPORT FAILED: {e}")
PY

cd ..

# Add power_exp_psf as a search directory for imports
export PYTHONPATH=$PWD/power_exp_psf:$PYTHONPATH

Step 4: Load datasets into dataset/datasets


Datasets download instructions
πŸ“¦ HAMMER

Download: HAMMER Dataset prepared by MoGe2.

cd dataset/datasets
wget https://huggingface.co/datasets/Ruicheng/monocular-geometry-evaluation/resolve/main/HAMMER.zip
unzip HAMMER.zip
rm -f HAMMER.zip
cd ../..
πŸ“¦ DDFF-12
Data split
cd dataset/datasets
mkdir ddff12_val_generation
cd ddff12_val_generation
mkdir third_part

Then, in your browser, navigate to the DFV Split (MS Sharepoint) prepared by DFF-DFV.

Click the download button. Then, copy the downloaded "my_ddff_trainVal.h5" file into dataset/datasets/ddff12_val_generation and rename it to "dfv_trainVal.h5".

Intrinsics matrix:

The intrinsics matrix is also provided by DFV(.mat file).

Download the "raw file" in the GitHub UI and place the downloaded IntParamLF.mat at "dataset/datasets/ddff_val_generation/third_part/".

At the end, the "dataset" directory should look like this (of which only ddff12_val_generation and HAMMER you need to create).

Expected format:
dataset/
β”œβ”€β”€ datasets/
β”‚   β”œβ”€β”€ ddff12_val_generation/
β”‚   β”‚   β”œβ”€β”€ dfv_trainVal.h5
β”‚   β”‚   └── third_part/
β”‚   β”‚       └── IntParamLF.mat
β”‚   β”œβ”€β”€ HAMMER/
β”‚   β”‚   └── scene2_traj1_1/
β”‚   β”‚   β”‚   └── 000000/
β”‚   β”‚   β”‚   β”‚   └── depth.png
β”‚   β”‚   β”‚   β”‚   └── intrinsics.json
β”‚   β”‚   β”‚   β”‚   └── meta.json
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── ...
β”‚   β”‚   └── .index.txt
β”‚   └── splits/
β”‚       └── infinigen_defocus/
β”‚           └── val.json
β”œβ”€β”€ __init__.py
β”œβ”€β”€ base.py
β”œβ”€β”€ ddff12_val.py
β”œβ”€β”€ hammer.py
β”œβ”€β”€ infinigen_defocus.py
β”œβ”€β”€ uniformat.py
└── zedd.py

Datasets that are loaded from HuggingFace (no user downloading necessary)

Note: the first time that evaluation is done on these datasets will take some time for the zip file to download and get unpacked. If you are downloading the zip file manually, note that you will have to delete the outer folder created by the unzipped file to achieve the above file structure (deleting of the outer folder is done automatically in the provided code).

Final expected format:
dataset/
β”œβ”€β”€ datasets/
β”‚   β”œβ”€β”€ ddff12_val_generation/
β”‚   β”‚   β”œβ”€β”€ dfv_trainVal.h5
β”‚   β”‚   └── third_part/
β”‚   β”‚       └── IntParamLF.mat
β”‚   β”œβ”€β”€ defocus_uniformat/
β”‚   β”‚   β”œβ”€β”€ diode/
β”‚   β”‚   β”‚   β”œβ”€β”€ diode_indoor_v2/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 000000.npy
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 000001.npy
β”‚   β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”‚   └── diode_outdoor_v2/
β”‚   β”‚   β”‚       β”œβ”€β”€ 000000.npy
β”‚   β”‚   β”‚       β”œβ”€β”€ 000001.npy
β”‚   β”‚   β”‚       └── ...
β”‚   β”‚   └── ibims/
β”‚   β”‚       β”œβ”€β”€ 000000.npy
β”‚   β”‚       β”œβ”€β”€ 000001.npy
β”‚   β”‚       └── ...
β”‚   β”œβ”€β”€ HAMMER/
β”‚   β”‚   β”œβ”€β”€ scene2_traj1_1/
β”‚   β”‚   β”‚   β”œβ”€β”€ 000000/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ depth.png
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ intrinsics.json
β”‚   β”‚   β”‚   β”‚   └── meta.json
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   └── .index.txt
β”‚   β”œβ”€β”€ infinigen_defocus/
β”‚   β”‚   β”œβ”€β”€ 1a4897de_1/
β”‚   β”‚   β”‚   β”œβ”€β”€ cam_all_in_focus.npz
β”‚   β”‚   β”‚   β”œβ”€β”€ cam_ap_1.40_fd_0.80.npz
β”‚   β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”‚   β”œβ”€β”€ depth.npy
β”‚   β”‚   β”‚   β”œβ”€β”€ image_all_in_focus.png
β”‚   β”‚   β”‚   └── image_ap_1.40_fd_0.80.png
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ ZEDD/
β”‚   β”‚   β”œβ”€β”€ test/
β”‚   β”‚   β”‚   β”œβ”€β”€ test_0001/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ focus_stack/
β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ img_run_1_motor_6D3E_aperture_F1.4.jpg
β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ img_run_1_motor_6D3E_aperture_F2.0.jpg
β”‚   β”‚   β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”‚   β”‚   └── gt/
β”‚   β”‚   β”‚   β”‚       └── K.txt
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── val/
β”‚   β”‚       β”œβ”€β”€ val_0001/
β”‚   β”‚       β”‚   β”œβ”€β”€ focus_stack/
β”‚   β”‚       β”‚   β”‚   β”œβ”€β”€ img_run_1_motor_6D3E_aperture_F1.4.jpg
β”‚   β”‚       β”‚   β”‚   β”œβ”€β”€ img_run_1_motor_6D3E_aperture_F2.0.jpg
β”‚   β”‚       β”‚   β”‚   └── ...
β”‚   β”‚       β”‚   └── gt/
β”‚   β”‚       β”‚       β”œβ”€β”€ depth_vis.jpg
β”‚   β”‚       β”‚       β”œβ”€β”€ depth.npy
β”‚   β”‚       β”‚       β”œβ”€β”€ K.txt
β”‚   β”‚       β”‚       └── overlay.jpg
β”‚   β”‚       └── ...
β”‚   └── splits/
β”‚       └── infinigen_defocus/
β”‚           └── val.json
β”œβ”€β”€ __init__.py
β”œβ”€β”€ base.py
β”œβ”€β”€ ddff12_val.py
β”œβ”€β”€ hammer.py
β”œβ”€β”€ infinigen_defocus.py
β”œβ”€β”€ uniformat.py
└── zedd.py

πŸ“¦ ZEDD

Dataset: ZEDD on Hugging Face


πŸ“¦ Infinigen Defocus

Dataset: Infinigen Defocus on Hugging Face


πŸ“¦ iBims-1 and DIODE

Dataset: Preprocessed (depth holes filled) on Hugging Face


Validation Quickstart

Running Validation

The easiest way to validate is using the distributed validation script:

bash dist_val.sh --encoder [VITS/VITB] --resumed_from [NAME OF PARAMETERS] --val_loader_config_choice [VAL_CONFIG_CHOICE]

Available Validation Configurations

See config/validation_configs.py for all predefined validation setups:

Model Loading Options

Option 1: Load from HuggingFace Hub (recommended)

resumed_from='model_name'  # automatically pull from venkatsubra/model_name

Option 2: Load from local path

resumed_from='/path/to/model.pth'

Reproducing Numbers in the Paper


πŸ”Ή ViT-S

Table 2

ZEDD

Note: The results below are on the validation split, so do not match the numbers in Table 2 on the test split

bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice zedd_F2_8_fixed_fd_0_2_4_6_8
D1.05 D1.15 D1.25 abs_rel
0.4450 0.7866 0.8858 0.0985
Infinigen
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice infinigen_defocus_F1_4_fixed_fd_0_8,1_7,3_0,4_7,8_0
D1.05 D1.15 D1.25 abs_rel
0.5201 0.8635 0.9400 0.0847

Table 3

iBims-1
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice ibims_F1_4_adaptive_fd
D1.05 D1.15 D1.25 abs_rel
0.5193 0.8502 0.9540 0.0745
DIODE
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice diode_F1_4_adaptive_fd
D1.05 D1.15 D1.25 abs_rel
0.4105 0.6649 0.7661 0.1778
HAMMER
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice hammer_F1_4_adaptive_fd
D1.05 D1.15 D1.25 abs_rel
0.6006 0.9889 0.9987 0.0440

Table 4

DDFF12 (Base Model)
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice ddff12_val
MSE RMSE AbsRel SqRel D1 D2 D3
0.0015 0.0352 0.2676 0.0119 0.3462 0.8119 0.9544
DDFF12 (Finetuned)
bash dist_val.sh --encoder vits --resumed_from fossa-vits-ddff-finetuned \
  --val_loader_config_choice ddff12_val
MSE RMSE AbsRel SqRel D1 D2 D3
0.0004 0.0183 0.1076 0.0045 0.9363 0.9829 0.9908

πŸ”Ή ViT-B

Table 2

ZEDD

Note: The results below are on the validation split, so do not match the numbers in Table 2 on the test split

bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice zedd_F2_8_fixed_fd_0_2_4_6_8
D1.05 D1.15 D1.25 abs_rel
0.4317 0.8101 0.9194 0.0957
Infinigen
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice infinigen_defocus_F1_4_fixed_fd_0_8,1_7,3_0,4_7,8_0

readme

D1.05 D1.15 D1.25 abs_rel
0.4199 0.8199 0.9355 0.0908

Table 3

iBims-1
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice ibims_F1_4_adaptive_fd
D1.05 D1.15 D1.25 abs_rel
0.5548 0.8719 0.9633 0.0701
DIODE
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice diode_F1_4_adaptive_fd
D1.05 D1.15 D1.25 abs_rel
0.4127 0.6692 0.7786 0.1601
HAMMER
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice hammer_F1_4_adaptive_fd
D1.05 D1.15 D1.25 abs_rel
0.9377 0.9974 0.9993 0.0172

Table 4

DDFF12 (Base Model)
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice ddff12_val
MSE RMSE AbsRel SqRel D1 D2 D3
0.0013 0.0324 0.2105 0.0107 0.6075 0.9206 0.9679
DDFF12 (Finetuned)
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb-ddff-finetuned \
  --val_loader_config_choice ddff12_val
MSE RMSE AbsRel SqRel D1 D2 D3
0.0003 0.0148 0.1088 0.0025 0.9322 0.9866 0.9939

Submitting to ZEDD Test Server

For ZEDD test set, save model outputs in the following format:

  • A single .zip file containing exactly 50 .npy files at the root level (no subdirectories)
  • Files must be named zedd_output_0001.npy through zedd_output_0050.npy
  • Each .npy file must be a 2-D float array of shape (H=1216, W=1824) β€” no channel dimension
  • All values must be finite (no NaN or Inf)

Please run the following command to check the file format before submitting to the server:

python zedd_test/zedd_check_format.py --zip [YOUR_ZIP_FILE]

Here is an example to compile the zip file for FOSSA ViT-S:

bash dist_test.sh --encoder=vits --resumed_from fossa-vits --val_loader_config_choice zedd_test_F2_8_fixed_fd_0_2_4_6_8 --experiment_name=FOSSA --zedd_test_output_dir=zedd_outputs

Troubleshooting

PowerExpPSF building

❌ Error: nvcc not found / CUDA extension build fails

If you see an error like: "error: [Errno 2] No such file or directory: '/usr/local/cuda-12.1/bin/nvcc'" or "nvcc not found", this means your environment does not have a CUDA toolkit with nvcc available.

βœ… Fix: Load a valid CUDA toolkit and set environment variables

On cluster environments, load an available CUDA module:

module avail cuda
module load cudatoolkit/12.6   # or closest version to your PyTorch CUDA
export CUDA_HOME=/usr/local/cuda-12.6
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"

Then verify:

which nvcc
nvcc --version

Then retry:

python setup.py build_ext --inplace

❌ Error: ModuleNotFoundError: No module named 'power_exp_psf_cuda'

If you see an error like: "ModuleNotFoundError: No module named 'power_exp_psf_cuda'", this means your environment does not know where to search for the power_exp_psf_cuda module.

βœ… Fix: Add the module to PYTHONPATH

From your project root, run:

export PYTHONPATH=$PWD/power_exp_psf:$PYTHONPATH

Then retry your script.

Acknowledgments

This codebase is partially based on Depth Anything v2, Video Depth Anything, DFF-DFV, and Unsupervised Depth from Focus.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for venkatsubra/fossa-vits