S23DR 2025 Challenge - Winning Solution πŸ†

This repository contains the winning solution for the Structured Semantic 3D Reconstruction (S23DR) Challenge 2025 at CVPR 2025 Workshop.

Our method achieves the winning performance in 3D wireframe reconstruction from multi-view images, combining COLMAP point clouds, semantic segmentation, and deep learning models to predict building wireframes with high accuracy.

🎯 Performance

Our solution achieved the best scores across key metrics:

  • HSS (Hybrid Structure Score): Superior spatial accuracy
  • F1 Score: Excellent balance of precision and recall
  • IoU (Intersection over Union): High overlap with ground truth

πŸ—οΈ Method Overview

Our approach consists of two main components:

1. Multi-Modal Data Fusion

  • COLMAP Point Clouds: Dense 3D reconstruction from multi-view images
  • Semantic Segmentation: ADE20K and Gestalt segmentation for building elements
  • Depth Information: Fitted dense depth maps aligned with 3D structure

2. Deep Learning Models

We employ two specialized neural networks:

FastPointNet (Vertex Prediction)

  • Input: 11D point cloud patches (xyz + rgb + features)
  • Architecture: Enhanced PointNet with residual connections, channel attention, and multi-scale pooling
  • Output: 3D vertex coordinates + confidence scores + classification
  • Model File: pnet.pth
  • Features:
    • Deeper architecture with 7 conv layers
    • Lightweight channel attention mechanism
    • Group normalization for stability
    • Multi-scale global pooling (max + average)

ClassificationPointNet (Edge Classification)

  • Input: 6D point cloud patches (xyz + rgb)
  • Architecture: Binary classification PointNet with deep feature extraction
  • Output: Edge/no-edge classification with confidence
  • Model File: pnet_class.pth
  • Features:
    • 6-layer convolutional feature extraction
    • Dropout regularization (0.3-0.5)
    • Xavier initialization

3. Patch-Based Processing Pipeline

Our pipeline processes local 3D patches around potential vertices:

  1. Initial Vertex Detection: Extract candidates from semantic segmentation maps
  2. Point Cloud Clustering: Group nearby 3D points using spatial clustering
  3. Patch Generation: Create local point cloud patches (1-2m radius) centered on clusters
  4. Neural Refinement: Use FastPointNet to refine vertex locations and classify validity
  5. Edge Prediction: Generate candidate edges between vertices and classify using ClassificationPointNet
  6. Post-Processing: Filter and merge results across multiple views

πŸš€ Quick Start

Training

# Train vertex prediction model
python train_pnet_v2.py

# Train edge classification model  
python train_pnet_class.py

Evaluation

# Run evaluation on HoHo25k dataset
python train.py --vertex_threshold 0.59 --edge_threshold 0.65 --only_predicted_connections True

Inference

# Generate predictions (used in competition)
# Uses pnet.pth and pnet_class.pth models
python script.py

πŸ“ Key Files

Core Models

  • fast_pointnet_v2.py - Enhanced PointNet for vertex prediction
  • fast_pointnet_class.py - PointNet for edge classification
  • end_to_end.py - VoxelUNet implementation (available but not used in final solution)

Pipeline

  • predict.py - Main wireframe prediction pipeline (2900+ lines)
  • train.py - Training and evaluation script
  • utils.py - COLMAP utilities and helper functions
  • visu.py - 3D visualization tools using Open3D

Data Processing

  • generate_pcloud_dataset.py - Dataset generation from HoHo25k
  • create_pcloud() - Multi-view point cloud fusion with semantic features

Analysis

  • find_best_results.py - Hyperparameter optimization and result analysis
  • color_visu.py - Color legend generation for semantic classes

πŸ”§ Technical Details

Input Data Processing

  • Multi-view RGB images with camera poses from COLMAP
  • Depth maps fitted to COLMAP sparse reconstruction
  • ADE20k segmentation for building detection
  • Gestalt segmentation for architectural elements (roof, walls, windows, etc.)

Feature Engineering

  • 11D Point Features: xyz coordinates + rgb colors + semantic labels + multi-view consistency
  • Patch Normalization: Center patches at local centroids with 0.5-2.0m radius
  • Data Augmentation: Random rotation, translation, scaling, and noise injection

Training Strategy

  • Multi-task Learning: Joint vertex position + confidence + classification prediction
  • Combined Loss: SmoothL1 (position) + SoftPlus (confidence) + BCE (classification)
  • Optimization: AdamW with cosine annealing, gradient clipping
  • Regularization: Dropout, weight decay, label smoothing

Hyperparameter Optimization

Our best configuration:

  • vertex_threshold: 0.59
  • edge_threshold: 0.65
  • only_predicted_connections: True

πŸ“Š Architecture Highlights

FastPointNet Enhancements

  • Residual Connections: Improved gradient flow in deep networks
  • Channel Attention: Focus on important feature channels
  • Multi-Scale Features: Combine max and average pooling (0.7 + 0.3 weighting)
  • Group Normalization: Better stability for small batches
  • Leaky ReLU: Prevent dying neurons (negative_slope=0.01)

Patch Processing Strategy

  • Hierarchical Clustering: Group points by spatial proximity
  • Multi-View Consistency: Aggregate features across camera views
  • Semantic-Aware Sampling: Prioritize building-relevant regions
  • Edge-Aware Patches: Generate candidate patches for all vertex pairs

🎨 Visualization

The repository includes comprehensive 3D visualization tools:

  • Point Cloud Rendering: COLMAP reconstructions with semantic colors
  • Wireframe Overlay: Ground truth vs predicted wireframes
  • Patch Visualization: Local point cloud patches with predicted vertices
  • Camera Frustums: Multi-view camera poses and coverage

πŸ“ˆ Evaluation Metrics

We evaluate using three key metrics:

  • HSS (Half-Space Score): Measures spatial accuracy of vertex positions
  • F1 Score: Harmonic mean of precision and recall for edge detection
  • IoU: Intersection over Union for overall wireframe quality

πŸ“„ Citation

Please cite our work if you use this code:

TODO

πŸ“‹ Requirements

  • Python 3.8+
  • PyTorch 1.12+
  • CUDA-capable GPU (recommended)
  • OpenCV, NumPy, SciPy
  • Open3D (visualization)
  • PyVista (optional, for advanced visualization)
  • HuggingFace Datasets

πŸ“œ License

Apache 2.0 - See LICENSE file for details.

🀝 Acknowledgments

The research was supported by Czech Science Foundation Grant No. 24-10738M. The access to the computational infrastructure of the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics" is also gratefully acknowledged. We also acknowledge the support from the Student Grant Competition of the Czech Technical University in Prague, grant No. SGS23/173/OHK3/3T/13.

Thanks to the S23DR Challenge organizers and the HoHo25k dataset creators for providing this excellent benchmark for 3D wireframe reconstruction research.


For detailed technical description, please refer to our paper and the comprehensive code documentation throughout the repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train jskvrna/hoho