hoho / README.md

Update README.md

a0f0481 verified 7 months ago

7.51 kB

	---
	license: apache-2.0
	datasets:
	- usm3d/hoho25k
	language:
	- en
	tags:
	- hoho25k
	- s23dr2025
	---
	# S23DR 2025 Challenge - Winning Solution 🏆

	This repository contains the winning solution for the [Structured Semantic 3D Reconstruction (S23DR) Challenge 2025](https://huggingface.co/spaces/usm3d/S23DR2025) at CVPR 2025 Workshop.

	Our method achieves the winning performance in 3D wireframe reconstruction from multi-view images, combining COLMAP point clouds, semantic segmentation, and deep learning models to predict building wireframes with high accuracy.

	## 🎯 Performance

	Our solution achieved the best scores across key metrics:
	- HSS (Hybrid Structure Score): Superior spatial accuracy
	- F1 Score: Excellent balance of precision and recall
	- IoU (Intersection over Union): High overlap with ground truth

	## 🏗️ Method Overview

	Our approach consists of two main components:

	### 1. Multi-Modal Data Fusion
	- COLMAP Point Clouds: Dense 3D reconstruction from multi-view images
	- Semantic Segmentation: ADE20K and Gestalt segmentation for building elements
	- Depth Information: Fitted dense depth maps aligned with 3D structure

	### 2. Deep Learning Models
	We employ two specialized neural networks:

	#### FastPointNet (Vertex Prediction)
	- Input: 11D point cloud patches (xyz + rgb + features)
	- Architecture: Enhanced PointNet with residual connections, channel attention, and multi-scale pooling
	- Output: 3D vertex coordinates + confidence scores + classification
	- Model File: `pnet.pth`
	- Features:
	- Deeper architecture with 7 conv layers
	- Lightweight channel attention mechanism
	- Group normalization for stability
	- Multi-scale global pooling (max + average)

	#### ClassificationPointNet (Edge Classification)
	- Input: 6D point cloud patches (xyz + rgb)
	- Architecture: Binary classification PointNet with deep feature extraction
	- Output: Edge/no-edge classification with confidence
	- Model File: `pnet_class.pth`
	- Features:
	- 6-layer convolutional feature extraction
	- Dropout regularization (0.3-0.5)
	- Xavier initialization

	### 3. Patch-Based Processing Pipeline

	Our pipeline processes local 3D patches around potential vertices:

	1. Initial Vertex Detection: Extract candidates from semantic segmentation maps
	2. Point Cloud Clustering: Group nearby 3D points using spatial clustering
	3. Patch Generation: Create local point cloud patches (1-2m radius) centered on clusters
	4. Neural Refinement: Use FastPointNet to refine vertex locations and classify validity
	5. Edge Prediction: Generate candidate edges between vertices and classify using ClassificationPointNet
	6. Post-Processing: Filter and merge results across multiple views

	## 🚀 Quick Start

	### Training
	```bash
	# Train vertex prediction model
	python train_pnet_v2.py

	# Train edge classification model
	python train_pnet_class.py
	```

	### Evaluation
	```bash
	# Run evaluation on HoHo25k dataset
	python train.py --vertex_threshold 0.59 --edge_threshold 0.65 --only_predicted_connections True
	```

	### Inference
	```bash
	# Generate predictions (used in competition)
	# Uses pnet.pth and pnet_class.pth models
	python script.py
	```

	## 📁 Key Files

	### Core Models
	- `fast_pointnet_v2.py` - Enhanced PointNet for vertex prediction
	- `fast_pointnet_class.py` - PointNet for edge classification
	- `end_to_end.py` - VoxelUNet implementation (available but not used in final solution)

	### Pipeline
	- `predict.py` - Main wireframe prediction pipeline (2900+ lines)
	- `train.py` - Training and evaluation script
	- `utils.py` - COLMAP utilities and helper functions
	- `visu.py` - 3D visualization tools using Open3D

	### Data Processing
	- `generate_pcloud_dataset.py` - Dataset generation from HoHo25k
	- `create_pcloud()` - Multi-view point cloud fusion with semantic features

	### Analysis
	- `find_best_results.py` - Hyperparameter optimization and result analysis
	- `color_visu.py` - Color legend generation for semantic classes

	## 🔧 Technical Details

	### Input Data Processing
	- Multi-view RGB images with camera poses from COLMAP
	- Depth maps fitted to COLMAP sparse reconstruction
	- ADE20k segmentation for building detection
	- Gestalt segmentation for architectural elements (roof, walls, windows, etc.)

	### Feature Engineering
	- 11D Point Features: xyz coordinates + rgb colors + semantic labels + multi-view consistency
	- Patch Normalization: Center patches at local centroids with 0.5-2.0m radius
	- Data Augmentation: Random rotation, translation, scaling, and noise injection

	### Training Strategy
	- Multi-task Learning: Joint vertex position + confidence + classification prediction
	- Combined Loss: SmoothL1 (position) + SoftPlus (confidence) + BCE (classification)
	- Optimization: AdamW with cosine annealing, gradient clipping
	- Regularization: Dropout, weight decay, label smoothing

	### Hyperparameter Optimization
	Our best configuration:
	- `vertex_threshold`: 0.59
	- `edge_threshold`: 0.65
	- `only_predicted_connections`: True

	## 📊 Architecture Highlights

	### FastPointNet Enhancements
	- Residual Connections: Improved gradient flow in deep networks
	- Channel Attention: Focus on important feature channels
	- Multi-Scale Features: Combine max and average pooling (0.7 + 0.3 weighting)
	- Group Normalization: Better stability for small batches
	- Leaky ReLU: Prevent dying neurons (negative_slope=0.01)

	### Patch Processing Strategy
	- Hierarchical Clustering: Group points by spatial proximity
	- Multi-View Consistency: Aggregate features across camera views
	- Semantic-Aware Sampling: Prioritize building-relevant regions
	- Edge-Aware Patches: Generate candidate patches for all vertex pairs

	## 🎨 Visualization

	The repository includes comprehensive 3D visualization tools:
	- Point Cloud Rendering: COLMAP reconstructions with semantic colors
	- Wireframe Overlay: Ground truth vs predicted wireframes
	- Patch Visualization: Local point cloud patches with predicted vertices
	- Camera Frustums: Multi-view camera poses and coverage

	## 📈 Evaluation Metrics

	We evaluate using three key metrics:
	- HSS (Half-Space Score): Measures spatial accuracy of vertex positions
	- F1 Score: Harmonic mean of precision and recall for edge detection
	- IoU: Intersection over Union for overall wireframe quality

	## 📄 Citation

	Please cite our work if you use this code:

	```bibtex
	TODO
	```

	## 📋 Requirements

	- Python 3.8+
	- PyTorch 1.12+
	- CUDA-capable GPU (recommended)
	- OpenCV, NumPy, SciPy
	- Open3D (visualization)
	- PyVista (optional, for advanced visualization)
	- HuggingFace Datasets

	## 📜 License

	Apache 2.0 - See LICENSE file for details.

	## 🤝 Acknowledgments

	The research was supported by Czech Science Foundation Grant No. 24-10738M. The access to the computational infrastructure of the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics" is also gratefully acknowledged. We also acknowledge the support from the Student Grant Competition of the Czech Technical University in Prague, grant No. SGS23/173/OHK3/3T/13.

	Thanks to the S23DR Challenge organizers and the HoHo25k dataset creators for providing this excellent benchmark for 3D wireframe reconstruction research.

	---

	For detailed technical description, please refer to our paper and the comprehensive code documentation throughout the repository.