OccluSynth Completer
Part of the OccluSynth pipeline for occlusion-aware 3D scene reconstruction (Aditya Agarwal, NIT Rourkela).
What it does
Given a partial, visibility-aware voxel grid produced by scanning a scene with a depth camera, the completer predicts the complete signed distance field (SDF) — including the OCCLUDED region behind observed surfaces.
Occluded voxels are those that are inside a camera frustum but permanently behind a measured surface: they never appear in any depth image, so 2D depth-inpainting networks cannot recover them at all. The completer operates in 3D on 96³ voxel crops and explicitly targets this region.
Architecture
Encoder-decoder 3D U-Net with skip connections.
| Component | Detail |
|---|---|
| Parameters | 14.7 M |
| Input | (B, 3, D, H, W) — channels: sdf, weight, p_observed |
| Output | (B, 1, D, H, W) — completed SDF in metres (unbounded) |
| Encoder | 4 blocks, channels [32, 64, 128, 256], stride-2 Conv3d |
| Decoder | 4 blocks, trilinear upsample + skip concat |
| Norms / activation | GroupNorm(8) + GELU throughout |
| Head | 1×1×1 Conv3d, no activation |
| Voxel pitch | 5 cm |
| Crop size | 96³ |
Loss: masked L1 on SURFACE ∪ OCCLUDED voxels only.
UNOBSERVABLE voxels (never inside any camera frustum) are excluded entirely
from loss — the network is never penalised for what it cannot know.
Training data
| Detail | Value |
|---|---|
| Dataset | ScanNet v2 (_vh_clean_2.ply meshes + GT poses + GT depth) |
| Scenes | 40 train / 10 val (deterministic md5 hash split) |
| Crops | 418 train / 90 val × 96³ at 5 cm |
| Crop rejection | occluded fraction < 10%, or > 50% exactly-zero GT SDF |
| Augmentation | 4 yaw rotations × 2 horizontal flips (z-axis never touched — preserves gravity-aligned priors) |
| GT SDF | mesh_to_tsdf() via open3d RaycastingScene, sampled on the same origin/dims as the fused grid |
Grid-alignment contract. The GT SDF is sampled at voxel centres
origin + (idx + 0.5) * 0.05 m, using the identical world-space origin
and grid dimensions as the partial grid produced by fuse_visibility().
Before any training data was generated, alignment was verified on
scene0000_00: median |GT SDF| at surface voxels = 2.51 cm
(threshold 7.5 cm), 93.1% of surface voxels within 1.5 voxels of the
GT zero-crossing. A mismatch here produces silent training on garbage.
Evaluation (64³ interim checkpoint, 35 epochs + augmentation)
Metrics split by voxel class — the split is the contribution. Surface voxels were observed; occluded voxels were not.
| Method | Class | MAE (cm) ↓ | Sign acc ↑ | Compl < 5 cm ↑ |
|---|---|---|---|---|
| no_completion (SDF = 0) | surface | 7.65 | 0.432 | 0.509 |
| no_completion | occluded | 45.27 | 0.299 | 0.061 |
| occluded_as_free (SDF = +0.1) | surface | 7.65 | 0.432 | 0.509 |
| occluded_as_free | occluded | 42.00 | 0.701 | 0.121 |
| OccluSynth Completer | surface | 4.86 | 0.585 | 0.768 |
| OccluSynth Completer | occluded | 27.14 | 0.722 | 0.349 |
The completer reduces occluded-voxel MAE by 40% vs the best baseline and achieves nearly 3× the completion ratio (34.9% vs 12.1% within 5 cm of GT). This is an interim checkpoint at 64³ crop size; a full 96³ A100 run is expected to improve these numbers further.
Usage
import torch
from occlusynth.models import OccluSynthCompleter
model = OccluSynthCompleter()
ckpt = torch.load("completer_best.pt", map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model"])
model.eval()
# inp: (B, 3, D, H, W) float32 — (sdf, weight, p_observed) from fuse_visibility_grid()
# All three channels should match the normalisation used during training:
# sdf normalised to [-1, 1] over the 10 cm surface band
# weight raw fusion weight (unnormalised)
# p_observed in [0, 1]
with torch.no_grad():
completed_sdf = model(inp) # (B, 1, D, H, W) metres
See src/occlusynth/fusion/scene_grid.py for how to produce the input from
raw RGB-D frames, and scripts/eval_completer.py for the full evaluation
pipeline.
ScanNet terms of use
The weights in this repository were trained on data derived from the ScanNet dataset. ScanNet is released for non-commercial research use only. Users of this checkpoint must comply with the ScanNet Terms of Use. In particular:
- Do not use these weights or the derived reconstructions for commercial purposes without explicit permission from the ScanNet authors.
- Do not redistribute the raw ScanNet data.
- Cite ScanNet in any publication that uses this model.
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR 2017.