hoho / README.md
jskvrna's picture
Update README.md
a0f0481 verified
---
license: apache-2.0
datasets:
- usm3d/hoho25k
language:
- en
tags:
- hoho25k
- s23dr2025
---
# S23DR 2025 Challenge - Winning Solution πŸ†
This repository contains the **winning solution** for the [Structured Semantic 3D Reconstruction (S23DR) Challenge 2025](https://huggingface.co/spaces/usm3d/S23DR2025) at CVPR 2025 Workshop.
Our method achieves the winning performance in 3D wireframe reconstruction from multi-view images, combining COLMAP point clouds, semantic segmentation, and deep learning models to predict building wireframes with high accuracy.
## 🎯 Performance
Our solution achieved the best scores across key metrics:
- **HSS (Hybrid Structure Score)**: Superior spatial accuracy
- **F1 Score**: Excellent balance of precision and recall
- **IoU (Intersection over Union)**: High overlap with ground truth
## πŸ—οΈ Method Overview
Our approach consists of two main components:
### 1. Multi-Modal Data Fusion
- **COLMAP Point Clouds**: Dense 3D reconstruction from multi-view images
- **Semantic Segmentation**: ADE20K and Gestalt segmentation for building elements
- **Depth Information**: Fitted dense depth maps aligned with 3D structure
### 2. Deep Learning Models
We employ two specialized neural networks:
#### FastPointNet (Vertex Prediction)
- **Input**: 11D point cloud patches (xyz + rgb + features)
- **Architecture**: Enhanced PointNet with residual connections, channel attention, and multi-scale pooling
- **Output**: 3D vertex coordinates + confidence scores + classification
- **Model File**: `pnet.pth`
- **Features**:
- Deeper architecture with 7 conv layers
- Lightweight channel attention mechanism
- Group normalization for stability
- Multi-scale global pooling (max + average)
#### ClassificationPointNet (Edge Classification)
- **Input**: 6D point cloud patches (xyz + rgb)
- **Architecture**: Binary classification PointNet with deep feature extraction
- **Output**: Edge/no-edge classification with confidence
- **Model File**: `pnet_class.pth`
- **Features**:
- 6-layer convolutional feature extraction
- Dropout regularization (0.3-0.5)
- Xavier initialization
### 3. Patch-Based Processing Pipeline
Our pipeline processes local 3D patches around potential vertices:
1. **Initial Vertex Detection**: Extract candidates from semantic segmentation maps
2. **Point Cloud Clustering**: Group nearby 3D points using spatial clustering
3. **Patch Generation**: Create local point cloud patches (1-2m radius) centered on clusters
4. **Neural Refinement**: Use FastPointNet to refine vertex locations and classify validity
5. **Edge Prediction**: Generate candidate edges between vertices and classify using ClassificationPointNet
6. **Post-Processing**: Filter and merge results across multiple views
## πŸš€ Quick Start
### Training
```bash
# Train vertex prediction model
python train_pnet_v2.py
# Train edge classification model
python train_pnet_class.py
```
### Evaluation
```bash
# Run evaluation on HoHo25k dataset
python train.py --vertex_threshold 0.59 --edge_threshold 0.65 --only_predicted_connections True
```
### Inference
```bash
# Generate predictions (used in competition)
# Uses pnet.pth and pnet_class.pth models
python script.py
```
## πŸ“ Key Files
### Core Models
- `fast_pointnet_v2.py` - Enhanced PointNet for vertex prediction
- `fast_pointnet_class.py` - PointNet for edge classification
- `end_to_end.py` - VoxelUNet implementation (available but not used in final solution)
### Pipeline
- `predict.py` - Main wireframe prediction pipeline (2900+ lines)
- `train.py` - Training and evaluation script
- `utils.py` - COLMAP utilities and helper functions
- `visu.py` - 3D visualization tools using Open3D
### Data Processing
- `generate_pcloud_dataset.py` - Dataset generation from HoHo25k
- `create_pcloud()` - Multi-view point cloud fusion with semantic features
### Analysis
- `find_best_results.py` - Hyperparameter optimization and result analysis
- `color_visu.py` - Color legend generation for semantic classes
## πŸ”§ Technical Details
### Input Data Processing
- **Multi-view RGB images** with camera poses from COLMAP
- **Depth maps** fitted to COLMAP sparse reconstruction
- **ADE20k segmentation** for building detection
- **Gestalt segmentation** for architectural elements (roof, walls, windows, etc.)
### Feature Engineering
- **11D Point Features**: xyz coordinates + rgb colors + semantic labels + multi-view consistency
- **Patch Normalization**: Center patches at local centroids with 0.5-2.0m radius
- **Data Augmentation**: Random rotation, translation, scaling, and noise injection
### Training Strategy
- **Multi-task Learning**: Joint vertex position + confidence + classification prediction
- **Combined Loss**: SmoothL1 (position) + SoftPlus (confidence) + BCE (classification)
- **Optimization**: AdamW with cosine annealing, gradient clipping
- **Regularization**: Dropout, weight decay, label smoothing
### Hyperparameter Optimization
Our best configuration:
- `vertex_threshold`: 0.59
- `edge_threshold`: 0.65
- `only_predicted_connections`: True
## πŸ“Š Architecture Highlights
### FastPointNet Enhancements
- **Residual Connections**: Improved gradient flow in deep networks
- **Channel Attention**: Focus on important feature channels
- **Multi-Scale Features**: Combine max and average pooling (0.7 + 0.3 weighting)
- **Group Normalization**: Better stability for small batches
- **Leaky ReLU**: Prevent dying neurons (negative_slope=0.01)
### Patch Processing Strategy
- **Hierarchical Clustering**: Group points by spatial proximity
- **Multi-View Consistency**: Aggregate features across camera views
- **Semantic-Aware Sampling**: Prioritize building-relevant regions
- **Edge-Aware Patches**: Generate candidate patches for all vertex pairs
## 🎨 Visualization
The repository includes comprehensive 3D visualization tools:
- **Point Cloud Rendering**: COLMAP reconstructions with semantic colors
- **Wireframe Overlay**: Ground truth vs predicted wireframes
- **Patch Visualization**: Local point cloud patches with predicted vertices
- **Camera Frustums**: Multi-view camera poses and coverage
## πŸ“ˆ Evaluation Metrics
We evaluate using three key metrics:
- **HSS (Half-Space Score)**: Measures spatial accuracy of vertex positions
- **F1 Score**: Harmonic mean of precision and recall for edge detection
- **IoU**: Intersection over Union for overall wireframe quality
## πŸ“„ Citation
Please cite our work if you use this code:
```bibtex
TODO
```
## πŸ“‹ Requirements
- Python 3.8+
- PyTorch 1.12+
- CUDA-capable GPU (recommended)
- OpenCV, NumPy, SciPy
- Open3D (visualization)
- PyVista (optional, for advanced visualization)
- HuggingFace Datasets
## πŸ“œ License
Apache 2.0 - See LICENSE file for details.
## 🀝 Acknowledgments
The research was supported by Czech Science Foundation Grant No. 24-10738M. The access to the computational infrastructure of the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics" is also gratefully acknowledged. We also acknowledge the support from the Student Grant Competition of the Czech Technical University in Prague, grant No. SGS23/173/OHK3/3T/13.
Thanks to the S23DR Challenge organizers and the HoHo25k dataset creators for providing this excellent benchmark for 3D wireframe reconstruction research.
---
For detailed technical description, please refer to our paper and the comprehensive code documentation throughout the repository.