YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
rayv9_learnt_baseline_snap
S23DR 2026 competition submission. Predicts building wireframes (vertices + edges) from multi-view indoor panoramas.
Pipeline
Per scene:
1. v1d heatmap model β per-view vertex heatmaps β spatial NMS β ~80 rays
2. rays + SfM points β 6-channel 64Γ32Γ64 voxel volume
β RayVoxelTransformerV7a β hybrid ray/dense NMS β world-space vertices (v9)
3. COLMAP + depth β point fusion β EdgeDepthSegmentsModel β wireframe (baseline)
4. snap β baseline verts snapped 80% toward nearest v9 vert within 2 m
+ unmatched v9 verts appended
Validation results (100 scenes, hoho22k_2026_trainval)
| Method | Mean HSS | Median HSS |
|---|---|---|
| Learned baseline | 0.352 | 0.369 |
| + Snap to v9 | 0.411 | 0.453 |
Vertex F1@0.5 = 0.494, F1@1.0 = 0.685. Snap wins on 85/100 val scenes.
Files
| File | Purpose |
|---|---|
script.py |
Entry point. Loads all models, iterates dataset, writes submission.json |
v9_inference.py |
Full v9 vertex pipeline: COLMAP parsing, v1d heatmap model, spatial NMS, voxel volume construction, RayVoxelTransformer inference, hybrid NMS |
baseline_inference.py |
Learned baseline pipeline: point fusion, EdgeDepthSegmentsModel forward pass, postprocessing (merge vertices, snap to point cloud, snap horizontal) |
snap.py |
snap_midpoint_plus_unmatched β snaps baseline verts toward v9 verts and appends unmatched v9 verts |
model.py |
RayVoxelTransformer (V7a): dense image unprojection pathway + sparse ray/SfM transformer pathway, merged via learned gate |
dataset.py |
Voxel volume utilities for inference: build_grid, splat_sfm, march_rays, assemble_volume |
voxel_grid.py |
VoxelGrid class with worldβvoxel transforms, ray marching, SfM splatting, GT target construction |
s23dr_2026_example/ |
Learned baseline package: point fusion, tokenizer, EdgeDepthSegmentsModel, postprocessing, varifold loss |
v1d_checkpoint.pt |
v1d heatmap model (GroupNorm CNN, 7-channel input) β 12 M params |
v9_checkpoint.pt |
RayVoxelTransformerV7a β 22 M params |
baseline_checkpoint.pt |
EdgeDepthSegmentsModel β 102 M params |
params.json |
Competition metadata (competition ID, dataset, time limit, output paths) |
submission.json |
Output: list of {order_id, wf_vertices, wf_edges} |
Setup
The baseline checkpoint (baseline_checkpoint.pt, 102 MB) is not included in this repo. Download it separately and place it in the root directory of this project:
# From Hugging Face
wget https://huggingface.co/kc92/rayv9_learnt_baseline_snap/resolve/main/baseline_checkpoint.pt
Usage
Competition submission (reads params.json, writes submission.json):
python script.py
Local smoke-test on the training split:
python script.py --mode local --n_scenes 10
Model details
v1d heatmap model
- Input: 7-channel image (3-ch gestalt + 1-ch depth + 3-ch ADE segmentation), resized to 576Γ768
- Architecture: dilated GroupNorm CNN encoder + transposed-conv decoder β sigmoid heatmap
- NMS: 7Γ7 max-pool peaks with threshold 0.30, up to 60 peaks/view, spatial suppression at 20 px radius across views β β€80 rays/scene
RayVoxelTransformer (V7a)
- Input: 6-channel 64Γ32Γ64 voxel volume
[log1p(ray_count), dir_x/y/z, mean_score, log1p(sfm)] - Dense pathway: image features (stride-4 CNN) unprojected into full grid β 3D conv β prob/offset
- Sparse pathway: active ray voxels + SfM point tokens β joint self-attention transformer β prob/offset at active voxels
- Merge: learned per-voxel gate between dense and sparse outputs
- NMS: hybrid ray-guided NMS (threshold 0.35, radius 0.3 m) + dense NMS (threshold 0.45, radius 0.5 m), max 48 vertices
Learned baseline
- Input: fused COLMAP + depth point cloud, priority-sampled to 4096 tokens (3072 COLMAP + 1024 depth)
- Architecture: Perceiver-style
EdgeDepthSegmentsModelwith cross-attention latent bottleneck β segment predictions β varifold-based vertex/edge extraction - Postprocessing: iterative vertex merging, snap to point cloud, snap horizontal
Snap
- For each baseline vertex, find nearest v9 vertex within 2 m; move baseline vertex 80% of the way toward it
- Append any v9 vertices not claimed by a snap as extra vertices (edges unchanged)
- Sweep-optimised params: weight=0.80, radius=2.0 m
- Downloads last month
- 42
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support