S23DR 2025 Challenge - Winning Solution 🏆

This repository contains the winning solution for the Structured Semantic 3D Reconstruction (S23DR) Challenge 2025 at CVPR 2025 Workshop.

Our method achieves the winning performance in 3D wireframe reconstruction from multi-view images, combining COLMAP point clouds, semantic segmentation, and deep learning models to predict building wireframes with high accuracy.

🎯 Performance

Our solution achieved the best scores across key metrics:

HSS (Hybrid Structure Score): Superior spatial accuracy
F1 Score: Excellent balance of precision and recall
IoU (Intersection over Union): High overlap with ground truth

🏗️ Method Overview

Our approach consists of two main components:

1. Multi-Modal Data Fusion

COLMAP Point Clouds: Dense 3D reconstruction from multi-view images
Semantic Segmentation: ADE20K and Gestalt segmentation for building elements
Depth Information: Fitted dense depth maps aligned with 3D structure

2. Deep Learning Models

We employ two specialized neural networks:

FastPointNet (Vertex Prediction)

Input: 11D point cloud patches (xyz + rgb + features)
Architecture: Enhanced PointNet with residual connections, channel attention, and multi-scale pooling
Output: 3D vertex coordinates + confidence scores + classification
Model File: pnet.pth
Features:
- Deeper architecture with 7 conv layers
- Lightweight channel attention mechanism
- Group normalization for stability
- Multi-scale global pooling (max + average)

ClassificationPointNet (Edge Classification)

Input: 6D point cloud patches (xyz + rgb)
Architecture: Binary classification PointNet with deep feature extraction
Output: Edge/no-edge classification with confidence
Model File: pnet_class.pth
Features:
- 6-layer convolutional feature extraction
- Dropout regularization (0.3-0.5)
- Xavier initialization

3. Patch-Based Processing Pipeline

Our pipeline processes local 3D patches around potential vertices:

Initial Vertex Detection: Extract candidates from semantic segmentation maps
Point Cloud Clustering: Group nearby 3D points using spatial clustering
Patch Generation: Create local point cloud patches (1-2m radius) centered on clusters
Neural Refinement: Use FastPointNet to refine vertex locations and classify validity
Edge Prediction: Generate candidate edges between vertices and classify using ClassificationPointNet
Post-Processing: Filter and merge results across multiple views

🚀 Quick Start

Training

# Train vertex prediction model
python train_pnet_v2.py

# Train edge classification model  
python train_pnet_class.py

Evaluation

# Run evaluation on HoHo25k dataset
python train.py --vertex_threshold 0.59 --edge_threshold 0.65 --only_predicted_connections True

Inference

# Generate predictions (used in competition)
# Uses pnet.pth and pnet_class.pth models
python script.py

📁 Key Files

Core Models

fast_pointnet_v2.py - Enhanced PointNet for vertex prediction
fast_pointnet_class.py - PointNet for edge classification
end_to_end.py - VoxelUNet implementation (available but not used in final solution)

Pipeline

predict.py - Main wireframe prediction pipeline (2900+ lines)
train.py - Training and evaluation script
utils.py - COLMAP utilities and helper functions
visu.py - 3D visualization tools using Open3D

Data Processing

generate_pcloud_dataset.py - Dataset generation from HoHo25k
create_pcloud() - Multi-view point cloud fusion with semantic features

Analysis

find_best_results.py - Hyperparameter optimization and result analysis
color_visu.py - Color legend generation for semantic classes

🔧 Technical Details

Input Data Processing

Multi-view RGB images with camera poses from COLMAP
Depth maps fitted to COLMAP sparse reconstruction
ADE20k segmentation for building detection
Gestalt segmentation for architectural elements (roof, walls, windows, etc.)

Feature Engineering

11D Point Features: xyz coordinates + rgb colors + semantic labels + multi-view consistency
Patch Normalization: Center patches at local centroids with 0.5-2.0m radius
Data Augmentation: Random rotation, translation, scaling, and noise injection

Training Strategy

Multi-task Learning: Joint vertex position + confidence + classification prediction
Combined Loss: SmoothL1 (position) + SoftPlus (confidence) + BCE (classification)
Optimization: AdamW with cosine annealing, gradient clipping
Regularization: Dropout, weight decay, label smoothing

Hyperparameter Optimization

Our best configuration:

vertex_threshold: 0.59
edge_threshold: 0.65
only_predicted_connections: True

📊 Architecture Highlights

FastPointNet Enhancements

Residual Connections: Improved gradient flow in deep networks
Channel Attention: Focus on important feature channels
Multi-Scale Features: Combine max and average pooling (0.7 + 0.3 weighting)
Group Normalization: Better stability for small batches
Leaky ReLU: Prevent dying neurons (negative_slope=0.01)

Patch Processing Strategy

Hierarchical Clustering: Group points by spatial proximity
Multi-View Consistency: Aggregate features across camera views
Semantic-Aware Sampling: Prioritize building-relevant regions
Edge-Aware Patches: Generate candidate patches for all vertex pairs

🎨 Visualization

The repository includes comprehensive 3D visualization tools:

Point Cloud Rendering: COLMAP reconstructions with semantic colors
Wireframe Overlay: Ground truth vs predicted wireframes
Patch Visualization: Local point cloud patches with predicted vertices
Camera Frustums: Multi-view camera poses and coverage

📈 Evaluation Metrics

We evaluate using three key metrics:

HSS (Half-Space Score): Measures spatial accuracy of vertex positions
F1 Score: Harmonic mean of precision and recall for edge detection
IoU: Intersection over Union for overall wireframe quality

📄 Citation

Please cite our work if you use this code:

TODO

📋 Requirements

Python 3.8+
PyTorch 1.12+
CUDA-capable GPU (recommended)
OpenCV, NumPy, SciPy
Open3D (visualization)
PyVista (optional, for advanced visualization)
HuggingFace Datasets

📜 License

Apache 2.0 - See LICENSE file for details.

🤝 Acknowledgments

The research was supported by Czech Science Foundation Grant No. 24-10738M. The access to the computational infrastructure of the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics" is also gratefully acknowledged. We also acknowledge the support from the Student Grant Competition of the Czech Technical University in Prague, grant No. SGS23/173/OHK3/3T/13.

Thanks to the S23DR Challenge organizers and the HoHo25k dataset creators for providing this excellent benchmark for 3D wireframe reconstruction research.

For detailed technical description, please refer to our paper and the comprehensive code documentation throughout the repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

jskvrna
/

hoho