File size: 7,510 Bytes
d46744e c6f2a03 d7cbe8a c6f2a03 d7cbe8a d46744e c6f2a03 a0f0481 c6f2a03 d46744e c6f2a03 d7cbe8a c6f2a03 d46744e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
---
license: apache-2.0
datasets:
- usm3d/hoho25k
language:
- en
tags:
- hoho25k
- s23dr2025
---
# S23DR 2025 Challenge - Winning Solution π
This repository contains the **winning solution** for the [Structured Semantic 3D Reconstruction (S23DR) Challenge 2025](https://huggingface.co/spaces/usm3d/S23DR2025) at CVPR 2025 Workshop.
Our method achieves the winning performance in 3D wireframe reconstruction from multi-view images, combining COLMAP point clouds, semantic segmentation, and deep learning models to predict building wireframes with high accuracy.
## π― Performance
Our solution achieved the best scores across key metrics:
- **HSS (Hybrid Structure Score)**: Superior spatial accuracy
- **F1 Score**: Excellent balance of precision and recall
- **IoU (Intersection over Union)**: High overlap with ground truth
## ποΈ Method Overview
Our approach consists of two main components:
### 1. Multi-Modal Data Fusion
- **COLMAP Point Clouds**: Dense 3D reconstruction from multi-view images
- **Semantic Segmentation**: ADE20K and Gestalt segmentation for building elements
- **Depth Information**: Fitted dense depth maps aligned with 3D structure
### 2. Deep Learning Models
We employ two specialized neural networks:
#### FastPointNet (Vertex Prediction)
- **Input**: 11D point cloud patches (xyz + rgb + features)
- **Architecture**: Enhanced PointNet with residual connections, channel attention, and multi-scale pooling
- **Output**: 3D vertex coordinates + confidence scores + classification
- **Model File**: `pnet.pth`
- **Features**:
- Deeper architecture with 7 conv layers
- Lightweight channel attention mechanism
- Group normalization for stability
- Multi-scale global pooling (max + average)
#### ClassificationPointNet (Edge Classification)
- **Input**: 6D point cloud patches (xyz + rgb)
- **Architecture**: Binary classification PointNet with deep feature extraction
- **Output**: Edge/no-edge classification with confidence
- **Model File**: `pnet_class.pth`
- **Features**:
- 6-layer convolutional feature extraction
- Dropout regularization (0.3-0.5)
- Xavier initialization
### 3. Patch-Based Processing Pipeline
Our pipeline processes local 3D patches around potential vertices:
1. **Initial Vertex Detection**: Extract candidates from semantic segmentation maps
2. **Point Cloud Clustering**: Group nearby 3D points using spatial clustering
3. **Patch Generation**: Create local point cloud patches (1-2m radius) centered on clusters
4. **Neural Refinement**: Use FastPointNet to refine vertex locations and classify validity
5. **Edge Prediction**: Generate candidate edges between vertices and classify using ClassificationPointNet
6. **Post-Processing**: Filter and merge results across multiple views
## π Quick Start
### Training
```bash
# Train vertex prediction model
python train_pnet_v2.py
# Train edge classification model
python train_pnet_class.py
```
### Evaluation
```bash
# Run evaluation on HoHo25k dataset
python train.py --vertex_threshold 0.59 --edge_threshold 0.65 --only_predicted_connections True
```
### Inference
```bash
# Generate predictions (used in competition)
# Uses pnet.pth and pnet_class.pth models
python script.py
```
## π Key Files
### Core Models
- `fast_pointnet_v2.py` - Enhanced PointNet for vertex prediction
- `fast_pointnet_class.py` - PointNet for edge classification
- `end_to_end.py` - VoxelUNet implementation (available but not used in final solution)
### Pipeline
- `predict.py` - Main wireframe prediction pipeline (2900+ lines)
- `train.py` - Training and evaluation script
- `utils.py` - COLMAP utilities and helper functions
- `visu.py` - 3D visualization tools using Open3D
### Data Processing
- `generate_pcloud_dataset.py` - Dataset generation from HoHo25k
- `create_pcloud()` - Multi-view point cloud fusion with semantic features
### Analysis
- `find_best_results.py` - Hyperparameter optimization and result analysis
- `color_visu.py` - Color legend generation for semantic classes
## π§ Technical Details
### Input Data Processing
- **Multi-view RGB images** with camera poses from COLMAP
- **Depth maps** fitted to COLMAP sparse reconstruction
- **ADE20k segmentation** for building detection
- **Gestalt segmentation** for architectural elements (roof, walls, windows, etc.)
### Feature Engineering
- **11D Point Features**: xyz coordinates + rgb colors + semantic labels + multi-view consistency
- **Patch Normalization**: Center patches at local centroids with 0.5-2.0m radius
- **Data Augmentation**: Random rotation, translation, scaling, and noise injection
### Training Strategy
- **Multi-task Learning**: Joint vertex position + confidence + classification prediction
- **Combined Loss**: SmoothL1 (position) + SoftPlus (confidence) + BCE (classification)
- **Optimization**: AdamW with cosine annealing, gradient clipping
- **Regularization**: Dropout, weight decay, label smoothing
### Hyperparameter Optimization
Our best configuration:
- `vertex_threshold`: 0.59
- `edge_threshold`: 0.65
- `only_predicted_connections`: True
## π Architecture Highlights
### FastPointNet Enhancements
- **Residual Connections**: Improved gradient flow in deep networks
- **Channel Attention**: Focus on important feature channels
- **Multi-Scale Features**: Combine max and average pooling (0.7 + 0.3 weighting)
- **Group Normalization**: Better stability for small batches
- **Leaky ReLU**: Prevent dying neurons (negative_slope=0.01)
### Patch Processing Strategy
- **Hierarchical Clustering**: Group points by spatial proximity
- **Multi-View Consistency**: Aggregate features across camera views
- **Semantic-Aware Sampling**: Prioritize building-relevant regions
- **Edge-Aware Patches**: Generate candidate patches for all vertex pairs
## π¨ Visualization
The repository includes comprehensive 3D visualization tools:
- **Point Cloud Rendering**: COLMAP reconstructions with semantic colors
- **Wireframe Overlay**: Ground truth vs predicted wireframes
- **Patch Visualization**: Local point cloud patches with predicted vertices
- **Camera Frustums**: Multi-view camera poses and coverage
## π Evaluation Metrics
We evaluate using three key metrics:
- **HSS (Half-Space Score)**: Measures spatial accuracy of vertex positions
- **F1 Score**: Harmonic mean of precision and recall for edge detection
- **IoU**: Intersection over Union for overall wireframe quality
## π Citation
Please cite our work if you use this code:
```bibtex
TODO
```
## π Requirements
- Python 3.8+
- PyTorch 1.12+
- CUDA-capable GPU (recommended)
- OpenCV, NumPy, SciPy
- Open3D (visualization)
- PyVista (optional, for advanced visualization)
- HuggingFace Datasets
## π License
Apache 2.0 - See LICENSE file for details.
## π€ Acknowledgments
The research was supported by Czech Science Foundation Grant No. 24-10738M. The access to the computational infrastructure of the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics" is also gratefully acknowledged. We also acknowledge the support from the Student Grant Competition of the Czech Technical University in Prague, grant No. SGS23/173/OHK3/3T/13.
Thanks to the S23DR Challenge organizers and the HoHo25k dataset creators for providing this excellent benchmark for 3D wireframe reconstruction research.
---
For detailed technical description, please refer to our paper and the comprehensive code documentation throughout the repository. |