File size: 7,510 Bytes
d46744e
 
 
 
 
 
 
 
 
 
c6f2a03
d7cbe8a
c6f2a03
d7cbe8a
d46744e
c6f2a03
 
 
 
a0f0481
c6f2a03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d46744e
c6f2a03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d7cbe8a
 
c6f2a03
d46744e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
---
license: apache-2.0
datasets:
- usm3d/hoho25k
language:
- en
tags:
- hoho25k
- s23dr2025
---
# S23DR 2025 Challenge - Winning Solution πŸ†

This repository contains the **winning solution** for the [Structured Semantic 3D Reconstruction (S23DR) Challenge 2025](https://huggingface.co/spaces/usm3d/S23DR2025) at CVPR 2025 Workshop.

Our method achieves the winning performance in 3D wireframe reconstruction from multi-view images, combining COLMAP point clouds, semantic segmentation, and deep learning models to predict building wireframes with high accuracy.

## 🎯 Performance

Our solution achieved the best scores across key metrics:
- **HSS (Hybrid Structure Score)**: Superior spatial accuracy
- **F1 Score**: Excellent balance of precision and recall
- **IoU (Intersection over Union)**: High overlap with ground truth

## πŸ—οΈ Method Overview

Our approach consists of two main components:

### 1. Multi-Modal Data Fusion
- **COLMAP Point Clouds**: Dense 3D reconstruction from multi-view images
- **Semantic Segmentation**: ADE20K and Gestalt segmentation for building elements
- **Depth Information**: Fitted dense depth maps aligned with 3D structure

### 2. Deep Learning Models
We employ two specialized neural networks:

#### FastPointNet (Vertex Prediction)
- **Input**: 11D point cloud patches (xyz + rgb + features)
- **Architecture**: Enhanced PointNet with residual connections, channel attention, and multi-scale pooling
- **Output**: 3D vertex coordinates + confidence scores + classification
- **Model File**: `pnet.pth`
- **Features**: 
  - Deeper architecture with 7 conv layers
  - Lightweight channel attention mechanism
  - Group normalization for stability
  - Multi-scale global pooling (max + average)

#### ClassificationPointNet (Edge Classification)  
- **Input**: 6D point cloud patches (xyz + rgb)
- **Architecture**: Binary classification PointNet with deep feature extraction
- **Output**: Edge/no-edge classification with confidence
- **Model File**: `pnet_class.pth`
- **Features**:
  - 6-layer convolutional feature extraction
  - Dropout regularization (0.3-0.5)
  - Xavier initialization

### 3. Patch-Based Processing Pipeline

Our pipeline processes local 3D patches around potential vertices:

1. **Initial Vertex Detection**: Extract candidates from semantic segmentation maps
2. **Point Cloud Clustering**: Group nearby 3D points using spatial clustering
3. **Patch Generation**: Create local point cloud patches (1-2m radius) centered on clusters
4. **Neural Refinement**: Use FastPointNet to refine vertex locations and classify validity
5. **Edge Prediction**: Generate candidate edges between vertices and classify using ClassificationPointNet
6. **Post-Processing**: Filter and merge results across multiple views

## πŸš€ Quick Start

### Training
```bash
# Train vertex prediction model
python train_pnet_v2.py

# Train edge classification model  
python train_pnet_class.py
```

### Evaluation
```bash
# Run evaluation on HoHo25k dataset
python train.py --vertex_threshold 0.59 --edge_threshold 0.65 --only_predicted_connections True
```

### Inference
```bash
# Generate predictions (used in competition)
# Uses pnet.pth and pnet_class.pth models
python script.py
```

## πŸ“ Key Files

### Core Models
- `fast_pointnet_v2.py` - Enhanced PointNet for vertex prediction
- `fast_pointnet_class.py` - PointNet for edge classification  
- `end_to_end.py` - VoxelUNet implementation (available but not used in final solution)

### Pipeline
- `predict.py` - Main wireframe prediction pipeline (2900+ lines)
- `train.py` - Training and evaluation script
- `utils.py` - COLMAP utilities and helper functions
- `visu.py` - 3D visualization tools using Open3D

### Data Processing
- `generate_pcloud_dataset.py` - Dataset generation from HoHo25k
- `create_pcloud()` - Multi-view point cloud fusion with semantic features

### Analysis
- `find_best_results.py` - Hyperparameter optimization and result analysis
- `color_visu.py` - Color legend generation for semantic classes

## πŸ”§ Technical Details

### Input Data Processing
- **Multi-view RGB images** with camera poses from COLMAP
- **Depth maps** fitted to COLMAP sparse reconstruction  
- **ADE20k segmentation** for building detection
- **Gestalt segmentation** for architectural elements (roof, walls, windows, etc.)

### Feature Engineering
- **11D Point Features**: xyz coordinates + rgb colors + semantic labels + multi-view consistency
- **Patch Normalization**: Center patches at local centroids with 0.5-2.0m radius
- **Data Augmentation**: Random rotation, translation, scaling, and noise injection

### Training Strategy
- **Multi-task Learning**: Joint vertex position + confidence + classification prediction
- **Combined Loss**: SmoothL1 (position) + SoftPlus (confidence) + BCE (classification)
- **Optimization**: AdamW with cosine annealing, gradient clipping
- **Regularization**: Dropout, weight decay, label smoothing

### Hyperparameter Optimization
Our best configuration:
- `vertex_threshold`: 0.59
- `edge_threshold`: 0.65  
- `only_predicted_connections`: True

## πŸ“Š Architecture Highlights

### FastPointNet Enhancements
- **Residual Connections**: Improved gradient flow in deep networks
- **Channel Attention**: Focus on important feature channels
- **Multi-Scale Features**: Combine max and average pooling (0.7 + 0.3 weighting)
- **Group Normalization**: Better stability for small batches
- **Leaky ReLU**: Prevent dying neurons (negative_slope=0.01)

### Patch Processing Strategy
- **Hierarchical Clustering**: Group points by spatial proximity
- **Multi-View Consistency**: Aggregate features across camera views
- **Semantic-Aware Sampling**: Prioritize building-relevant regions
- **Edge-Aware Patches**: Generate candidate patches for all vertex pairs

## 🎨 Visualization

The repository includes comprehensive 3D visualization tools:
- **Point Cloud Rendering**: COLMAP reconstructions with semantic colors
- **Wireframe Overlay**: Ground truth vs predicted wireframes  
- **Patch Visualization**: Local point cloud patches with predicted vertices
- **Camera Frustums**: Multi-view camera poses and coverage

## πŸ“ˆ Evaluation Metrics

We evaluate using three key metrics:
- **HSS (Half-Space Score)**: Measures spatial accuracy of vertex positions
- **F1 Score**: Harmonic mean of precision and recall for edge detection
- **IoU**: Intersection over Union for overall wireframe quality

## πŸ“„ Citation

Please cite our work if you use this code:

```bibtex
TODO
```

## πŸ“‹ Requirements

- Python 3.8+
- PyTorch 1.12+
- CUDA-capable GPU (recommended)
- OpenCV, NumPy, SciPy
- Open3D (visualization)
- PyVista (optional, for advanced visualization)
- HuggingFace Datasets

## πŸ“œ License

Apache 2.0 - See LICENSE file for details.

## 🀝 Acknowledgments

The research was supported by Czech Science Foundation Grant No. 24-10738M. The access to the computational infrastructure of the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics" is also gratefully acknowledged. We also acknowledge the support from the Student Grant Competition of the Czech Technical University in Prague, grant No. SGS23/173/OHK3/3T/13.

Thanks to the S23DR Challenge organizers and the HoHo25k dataset creators for providing this excellent benchmark for 3D wireframe reconstruction research.

---

For detailed technical description, please refer to our paper and the comprehensive code documentation throughout the repository.