Finetuning Sapiens: Depth Estimation
This guide outlines the process to finetune the pretrained Sapiens model for relative depth estimation on custom data.
π 1. Data Preparation
Set $DATA_ROOT as your training data root directory.
We provide a toy dataset for easy start at sapiens_toy_dataset.
Download and unzip the folders in $DATA_ROOT.
The train data directory structure is as follows:
$DATA_ROOT/
βββ images/
β βββ 00000000.png
β βββ 00000001.png
β βββ 00000002.png
βββ masks/
β βββ 00000000.png
β βββ 00000001.png
β βββ 00000002.png
βββ depths/
β βββ 00000000.npy
β βββ 00000001.npy
β βββ 00000002.npy
The folders as follows:
-$DATA_ROOT/images: RGB images (.png or .jpg or .jpeg).
-$DATA_ROOT/mask: Boolean masks for human pixels (.png, .jpg, or .jpeg).
-$DATA_ROOT/depths: Ground truth depths.
βοΈ 2. Configuration Update
Edit $SAPIENS_ROOT/seg/configs/sapiens_depth/depth_general/sapiens_1b_depth_general-1024x768.py:
- Set
pretrained_checkpointto your checkpoint path. - Update
dataset_train.data_rootto your$DATA_ROOT. - (Optional) Adjust hyperparameters like
num_epochsandoptim_wrapper.optimizer.lr.
ποΈ 3. Finetuning
The following guide is for Sapiens-1B. Simply choose the config file from here to use other backbones.
The training scripts are under: $SAPIENS_ROOT/seg/scripts/finetune/depth_general/sapiens_1b
Make sure you have activated the sapiens python conda environment.
A. π Single-node Training
Use $SAPIENS_ROOT/seg/scripts/finetune/depth_general/sapiens_1b/node.sh.
Key variables:
DEVICES: GPU IDs (e.g., "0,1,2,3,4,5,6,7")TRAIN_BATCH_SIZE_PER_GPU: Default 2OUTPUT_DIR: Checkpoint and log directoryRESUME_FROM: Checkpoint to resume training from. Starts training from previous epoch. Defaults to empty string.LOAD_FROM: Checkpoint to load weight from. Starts training from epoch 0. Defaults to empty string.mode=multi-gpu: Launch multi-gpu training with multiple workers for dataloading.mode=debug: (Optional) To debug. Launched single gpu dry run, with single worker for dataloading. Supports interactive debugging with pdb/ipdb.
Note, if you wish to finetune from an existing depth estimation checkpoint, set the LOAD_FROM variable.
Launch:
cd $SAPIENS_ROOT/seg/scripts/finetune/depth_general/sapiens_1b
./node.sh
B. π Multi-node Training (Slurm)
Use $SAPIENS_ROOT/seg/scripts/finetune/depth_general/sapiens_1b/slurm.sh
Additional variables:
CONDA_ENV: Path to conda environmentNUM_NODES: Number of nodes (default 4, 8 GPUs per node)
Launch:
cd $SAPIENS_ROOT/seg/scripts/finetune/depth_general/sapiens_1b
./slurm.sh