|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- usm3d/hoho25k |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- hoho25k |
|
|
- s23dr2025 |
|
|
--- |
|
|
# S23DR 2025 Challenge - Winning Solution π |
|
|
|
|
|
This repository contains the **winning solution** for the [Structured Semantic 3D Reconstruction (S23DR) Challenge 2025](https://huggingface.co/spaces/usm3d/S23DR2025) at CVPR 2025 Workshop. |
|
|
|
|
|
Our method achieves the winning performance in 3D wireframe reconstruction from multi-view images, combining COLMAP point clouds, semantic segmentation, and deep learning models to predict building wireframes with high accuracy. |
|
|
|
|
|
## π― Performance |
|
|
|
|
|
Our solution achieved the best scores across key metrics: |
|
|
- **HSS (Hybrid Structure Score)**: Superior spatial accuracy |
|
|
- **F1 Score**: Excellent balance of precision and recall |
|
|
- **IoU (Intersection over Union)**: High overlap with ground truth |
|
|
|
|
|
## ποΈ Method Overview |
|
|
|
|
|
Our approach consists of two main components: |
|
|
|
|
|
### 1. Multi-Modal Data Fusion |
|
|
- **COLMAP Point Clouds**: Dense 3D reconstruction from multi-view images |
|
|
- **Semantic Segmentation**: ADE20K and Gestalt segmentation for building elements |
|
|
- **Depth Information**: Fitted dense depth maps aligned with 3D structure |
|
|
|
|
|
### 2. Deep Learning Models |
|
|
We employ two specialized neural networks: |
|
|
|
|
|
#### FastPointNet (Vertex Prediction) |
|
|
- **Input**: 11D point cloud patches (xyz + rgb + features) |
|
|
- **Architecture**: Enhanced PointNet with residual connections, channel attention, and multi-scale pooling |
|
|
- **Output**: 3D vertex coordinates + confidence scores + classification |
|
|
- **Model File**: `pnet.pth` |
|
|
- **Features**: |
|
|
- Deeper architecture with 7 conv layers |
|
|
- Lightweight channel attention mechanism |
|
|
- Group normalization for stability |
|
|
- Multi-scale global pooling (max + average) |
|
|
|
|
|
#### ClassificationPointNet (Edge Classification) |
|
|
- **Input**: 6D point cloud patches (xyz + rgb) |
|
|
- **Architecture**: Binary classification PointNet with deep feature extraction |
|
|
- **Output**: Edge/no-edge classification with confidence |
|
|
- **Model File**: `pnet_class.pth` |
|
|
- **Features**: |
|
|
- 6-layer convolutional feature extraction |
|
|
- Dropout regularization (0.3-0.5) |
|
|
- Xavier initialization |
|
|
|
|
|
### 3. Patch-Based Processing Pipeline |
|
|
|
|
|
Our pipeline processes local 3D patches around potential vertices: |
|
|
|
|
|
1. **Initial Vertex Detection**: Extract candidates from semantic segmentation maps |
|
|
2. **Point Cloud Clustering**: Group nearby 3D points using spatial clustering |
|
|
3. **Patch Generation**: Create local point cloud patches (1-2m radius) centered on clusters |
|
|
4. **Neural Refinement**: Use FastPointNet to refine vertex locations and classify validity |
|
|
5. **Edge Prediction**: Generate candidate edges between vertices and classify using ClassificationPointNet |
|
|
6. **Post-Processing**: Filter and merge results across multiple views |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Training |
|
|
```bash |
|
|
# Train vertex prediction model |
|
|
python train_pnet_v2.py |
|
|
|
|
|
# Train edge classification model |
|
|
python train_pnet_class.py |
|
|
``` |
|
|
|
|
|
### Evaluation |
|
|
```bash |
|
|
# Run evaluation on HoHo25k dataset |
|
|
python train.py --vertex_threshold 0.59 --edge_threshold 0.65 --only_predicted_connections True |
|
|
``` |
|
|
|
|
|
### Inference |
|
|
```bash |
|
|
# Generate predictions (used in competition) |
|
|
# Uses pnet.pth and pnet_class.pth models |
|
|
python script.py |
|
|
``` |
|
|
|
|
|
## π Key Files |
|
|
|
|
|
### Core Models |
|
|
- `fast_pointnet_v2.py` - Enhanced PointNet for vertex prediction |
|
|
- `fast_pointnet_class.py` - PointNet for edge classification |
|
|
- `end_to_end.py` - VoxelUNet implementation (available but not used in final solution) |
|
|
|
|
|
### Pipeline |
|
|
- `predict.py` - Main wireframe prediction pipeline (2900+ lines) |
|
|
- `train.py` - Training and evaluation script |
|
|
- `utils.py` - COLMAP utilities and helper functions |
|
|
- `visu.py` - 3D visualization tools using Open3D |
|
|
|
|
|
### Data Processing |
|
|
- `generate_pcloud_dataset.py` - Dataset generation from HoHo25k |
|
|
- `create_pcloud()` - Multi-view point cloud fusion with semantic features |
|
|
|
|
|
### Analysis |
|
|
- `find_best_results.py` - Hyperparameter optimization and result analysis |
|
|
- `color_visu.py` - Color legend generation for semantic classes |
|
|
|
|
|
## π§ Technical Details |
|
|
|
|
|
### Input Data Processing |
|
|
- **Multi-view RGB images** with camera poses from COLMAP |
|
|
- **Depth maps** fitted to COLMAP sparse reconstruction |
|
|
- **ADE20k segmentation** for building detection |
|
|
- **Gestalt segmentation** for architectural elements (roof, walls, windows, etc.) |
|
|
|
|
|
### Feature Engineering |
|
|
- **11D Point Features**: xyz coordinates + rgb colors + semantic labels + multi-view consistency |
|
|
- **Patch Normalization**: Center patches at local centroids with 0.5-2.0m radius |
|
|
- **Data Augmentation**: Random rotation, translation, scaling, and noise injection |
|
|
|
|
|
### Training Strategy |
|
|
- **Multi-task Learning**: Joint vertex position + confidence + classification prediction |
|
|
- **Combined Loss**: SmoothL1 (position) + SoftPlus (confidence) + BCE (classification) |
|
|
- **Optimization**: AdamW with cosine annealing, gradient clipping |
|
|
- **Regularization**: Dropout, weight decay, label smoothing |
|
|
|
|
|
### Hyperparameter Optimization |
|
|
Our best configuration: |
|
|
- `vertex_threshold`: 0.59 |
|
|
- `edge_threshold`: 0.65 |
|
|
- `only_predicted_connections`: True |
|
|
|
|
|
## π Architecture Highlights |
|
|
|
|
|
### FastPointNet Enhancements |
|
|
- **Residual Connections**: Improved gradient flow in deep networks |
|
|
- **Channel Attention**: Focus on important feature channels |
|
|
- **Multi-Scale Features**: Combine max and average pooling (0.7 + 0.3 weighting) |
|
|
- **Group Normalization**: Better stability for small batches |
|
|
- **Leaky ReLU**: Prevent dying neurons (negative_slope=0.01) |
|
|
|
|
|
### Patch Processing Strategy |
|
|
- **Hierarchical Clustering**: Group points by spatial proximity |
|
|
- **Multi-View Consistency**: Aggregate features across camera views |
|
|
- **Semantic-Aware Sampling**: Prioritize building-relevant regions |
|
|
- **Edge-Aware Patches**: Generate candidate patches for all vertex pairs |
|
|
|
|
|
## π¨ Visualization |
|
|
|
|
|
The repository includes comprehensive 3D visualization tools: |
|
|
- **Point Cloud Rendering**: COLMAP reconstructions with semantic colors |
|
|
- **Wireframe Overlay**: Ground truth vs predicted wireframes |
|
|
- **Patch Visualization**: Local point cloud patches with predicted vertices |
|
|
- **Camera Frustums**: Multi-view camera poses and coverage |
|
|
|
|
|
## π Evaluation Metrics |
|
|
|
|
|
We evaluate using three key metrics: |
|
|
- **HSS (Half-Space Score)**: Measures spatial accuracy of vertex positions |
|
|
- **F1 Score**: Harmonic mean of precision and recall for edge detection |
|
|
- **IoU**: Intersection over Union for overall wireframe quality |
|
|
|
|
|
## π Citation |
|
|
|
|
|
Please cite our work if you use this code: |
|
|
|
|
|
```bibtex |
|
|
TODO |
|
|
``` |
|
|
|
|
|
## π Requirements |
|
|
|
|
|
- Python 3.8+ |
|
|
- PyTorch 1.12+ |
|
|
- CUDA-capable GPU (recommended) |
|
|
- OpenCV, NumPy, SciPy |
|
|
- Open3D (visualization) |
|
|
- PyVista (optional, for advanced visualization) |
|
|
- HuggingFace Datasets |
|
|
|
|
|
## π License |
|
|
|
|
|
Apache 2.0 - See LICENSE file for details. |
|
|
|
|
|
## π€ Acknowledgments |
|
|
|
|
|
The research was supported by Czech Science Foundation Grant No. 24-10738M. The access to the computational infrastructure of the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics" is also gratefully acknowledged. We also acknowledge the support from the Student Grant Competition of the Czech Technical University in Prague, grant No. SGS23/173/OHK3/3T/13. |
|
|
|
|
|
Thanks to the S23DR Challenge organizers and the HoHo25k dataset creators for providing this excellent benchmark for 3D wireframe reconstruction research. |
|
|
|
|
|
--- |
|
|
|
|
|
For detailed technical description, please refer to our paper and the comprehensive code documentation throughout the repository. |