OccluSynth Completer

Part of the OccluSynth pipeline for occlusion-aware 3D scene reconstruction (Aditya Agarwal, NIT Rourkela).

What it does

Given a partial, visibility-aware voxel grid produced by scanning a scene with a depth camera, the completer predicts the complete signed distance field (SDF) — including the OCCLUDED region behind observed surfaces.

Occluded voxels are those that are inside a camera frustum but permanently behind a measured surface: they never appear in any depth image, so 2D depth-inpainting networks cannot recover them at all. The completer operates in 3D on 96³ voxel crops and explicitly targets this region.

Architecture

Encoder-decoder 3D U-Net with skip connections.

Component	Detail
Parameters	14.7 M
Input	`(B, 3, D, H, W)` — channels: sdf, weight, p_observed
Output	`(B, 1, D, H, W)` — completed SDF in metres (unbounded)
Encoder	4 blocks, channels [32, 64, 128, 256], stride-2 Conv3d
Decoder	4 blocks, trilinear upsample + skip concat
Norms / activation	GroupNorm(8) + GELU throughout
Head	1×1×1 Conv3d, no activation
Voxel pitch	5 cm
Crop size	96³

Loss: masked L1 on SURFACE ∪ OCCLUDED voxels only. UNOBSERVABLE voxels (never inside any camera frustum) are excluded entirely from loss — the network is never penalised for what it cannot know.

Training data

Detail	Value
Dataset	ScanNet v2 (`_vh_clean_2.ply` meshes + GT poses + GT depth)
Scenes	40 train / 10 val (deterministic md5 hash split)
Crops	418 train / 90 val × 96³ at 5 cm
Crop rejection	occluded fraction < 10%, or > 50% exactly-zero GT SDF
Augmentation	4 yaw rotations × 2 horizontal flips (z-axis never touched — preserves gravity-aligned priors)
GT SDF	`mesh_to_tsdf()` via open3d RaycastingScene, sampled on the same origin/dims as the fused grid

Grid-alignment contract. The GT SDF is sampled at voxel centres origin + (idx + 0.5) * 0.05 m, using the identical world-space origin and grid dimensions as the partial grid produced by fuse_visibility(). Before any training data was generated, alignment was verified on scene0000_00: median |GT SDF| at surface voxels = 2.51 cm (threshold 7.5 cm), 93.1% of surface voxels within 1.5 voxels of the GT zero-crossing. A mismatch here produces silent training on garbage.

Evaluation (64³ interim checkpoint, 35 epochs + augmentation)

Metrics split by voxel class — the split is the contribution. Surface voxels were observed; occluded voxels were not.

Method	Class	MAE (cm) ↓	Sign acc ↑	Compl < 5 cm ↑
no_completion (SDF = 0)	surface	7.65	0.432	0.509
no_completion	occluded	45.27	0.299	0.061
occluded_as_free (SDF = +0.1)	surface	7.65	0.432	0.509
occluded_as_free	occluded	42.00	0.701	0.121
OccluSynth Completer	surface	4.86	0.585	0.768
OccluSynth Completer	occluded	27.14	0.722	0.349

The completer reduces occluded-voxel MAE by 40% vs the best baseline and achieves nearly 3× the completion ratio (34.9% vs 12.1% within 5 cm of GT). This is an interim checkpoint at 64³ crop size; a full 96³ A100 run is expected to improve these numbers further.

Usage

import torch
from occlusynth.models import OccluSynthCompleter

model = OccluSynthCompleter()
ckpt = torch.load("completer_best.pt", map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model"])
model.eval()

# inp: (B, 3, D, H, W) float32 — (sdf, weight, p_observed) from fuse_visibility_grid()
# All three channels should match the normalisation used during training:
#   sdf      normalised to [-1, 1] over the 10 cm surface band
#   weight   raw fusion weight (unnormalised)
#   p_observed in [0, 1]
with torch.no_grad():
    completed_sdf = model(inp)  # (B, 1, D, H, W) metres

See src/occlusynth/fusion/scene_grid.py for how to produce the input from raw RGB-D frames, and scripts/eval_completer.py for the full evaluation pipeline.

ScanNet terms of use

The weights in this repository were trained on data derived from the ScanNet dataset. ScanNet is released for non-commercial research use only. Users of this checkpoint must comply with the ScanNet Terms of Use. In particular:

Do not use these weights or the derived reconstructions for commercial purposes without explicit permission from the ScanNet authors.
Do not redistribute the raw ScanNet data.
Cite ScanNet in any publication that uses this model.

Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR 2017.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support