YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

rayv9_learnt_baseline_snap

S23DR 2026 competition submission. Predicts building wireframes (vertices + edges) from multi-view indoor panoramas.

Pipeline

Per scene:
  1. v1d heatmap model  β†’  per-view vertex heatmaps  β†’  spatial NMS  β†’  ~80 rays
  2. rays + SfM points  β†’  6-channel 64Γ—32Γ—64 voxel volume
                        β†’  RayVoxelTransformerV7a  β†’  hybrid ray/dense NMS  β†’  world-space vertices (v9)
  3. COLMAP + depth     β†’  point fusion  β†’  EdgeDepthSegmentsModel  β†’  wireframe (baseline)
  4. snap               β†’  baseline verts snapped 80% toward nearest v9 vert within 2 m
                           + unmatched v9 verts appended

Validation results (100 scenes, hoho22k_2026_trainval)

Method Mean HSS Median HSS
Learned baseline 0.352 0.369
+ Snap to v9 0.411 0.453

Vertex F1@0.5 = 0.494, F1@1.0 = 0.685. Snap wins on 85/100 val scenes.

Files

File Purpose
script.py Entry point. Loads all models, iterates dataset, writes submission.json
v9_inference.py Full v9 vertex pipeline: COLMAP parsing, v1d heatmap model, spatial NMS, voxel volume construction, RayVoxelTransformer inference, hybrid NMS
baseline_inference.py Learned baseline pipeline: point fusion, EdgeDepthSegmentsModel forward pass, postprocessing (merge vertices, snap to point cloud, snap horizontal)
snap.py snap_midpoint_plus_unmatched β€” snaps baseline verts toward v9 verts and appends unmatched v9 verts
model.py RayVoxelTransformer (V7a): dense image unprojection pathway + sparse ray/SfM transformer pathway, merged via learned gate
dataset.py Voxel volume utilities for inference: build_grid, splat_sfm, march_rays, assemble_volume
voxel_grid.py VoxelGrid class with world↔voxel transforms, ray marching, SfM splatting, GT target construction
s23dr_2026_example/ Learned baseline package: point fusion, tokenizer, EdgeDepthSegmentsModel, postprocessing, varifold loss
v1d_checkpoint.pt v1d heatmap model (GroupNorm CNN, 7-channel input) β€” 12 M params
v9_checkpoint.pt RayVoxelTransformerV7a β€” 22 M params
baseline_checkpoint.pt EdgeDepthSegmentsModel β€” 102 M params
params.json Competition metadata (competition ID, dataset, time limit, output paths)
submission.json Output: list of {order_id, wf_vertices, wf_edges}

Setup

The baseline checkpoint (baseline_checkpoint.pt, 102 MB) is not included in this repo. Download it separately and place it in the root directory of this project:

# From Hugging Face
wget https://huggingface.co/kc92/rayv9_learnt_baseline_snap/resolve/main/baseline_checkpoint.pt

Usage

Competition submission (reads params.json, writes submission.json):

python script.py

Local smoke-test on the training split:

python script.py --mode local --n_scenes 10

Model details

v1d heatmap model

  • Input: 7-channel image (3-ch gestalt + 1-ch depth + 3-ch ADE segmentation), resized to 576Γ—768
  • Architecture: dilated GroupNorm CNN encoder + transposed-conv decoder β†’ sigmoid heatmap
  • NMS: 7Γ—7 max-pool peaks with threshold 0.30, up to 60 peaks/view, spatial suppression at 20 px radius across views β†’ ≀80 rays/scene

RayVoxelTransformer (V7a)

  • Input: 6-channel 64Γ—32Γ—64 voxel volume [log1p(ray_count), dir_x/y/z, mean_score, log1p(sfm)]
  • Dense pathway: image features (stride-4 CNN) unprojected into full grid β†’ 3D conv β†’ prob/offset
  • Sparse pathway: active ray voxels + SfM point tokens β†’ joint self-attention transformer β†’ prob/offset at active voxels
  • Merge: learned per-voxel gate between dense and sparse outputs
  • NMS: hybrid ray-guided NMS (threshold 0.35, radius 0.3 m) + dense NMS (threshold 0.45, radius 0.5 m), max 48 vertices

Learned baseline

  • Input: fused COLMAP + depth point cloud, priority-sampled to 4096 tokens (3072 COLMAP + 1024 depth)
  • Architecture: Perceiver-style EdgeDepthSegmentsModel with cross-attention latent bottleneck β†’ segment predictions β†’ varifold-based vertex/edge extraction
  • Postprocessing: iterative vertex merging, snap to point cloud, snap horizontal

Snap

  • For each baseline vertex, find nearest v9 vertex within 2 m; move baseline vertex 80% of the way toward it
  • Append any v9 vertices not claimed by a snap as extra vertices (edges unchanged)
  • Sweep-optimised params: weight=0.80, radius=2.0 m
Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support