update readme
Browse files- README.md +132 -3
- hrnetv2_w48_imagenet_pretrained.pth +0 -3
README.md
CHANGED
|
@@ -1,3 +1,132 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div align="center">
|
| 2 |
+
<h2 align="center">The P<sup>3</sup> dataset: Pixels, Points and Polygons <br> for Multimodal Building Vectorization</h2>
|
| 3 |
+
<h3><align="center">Raphael Sulzer<sup>1,2</sup> Liuyun Duan<sup>1</sup>
|
| 4 |
+
Nicolas Girard<sup>1</sup> Florent Lafarge<sup>2</sup></a></h3>
|
| 5 |
+
<align="center"><sup>1</sup>LuxCarta Technology <br> <sup>2</sup>Centre Inria d'Université Côte d'Azur
|
| 6 |
+
<img src="./teaser.jpg" width=100% height=100%>
|
| 7 |
+
<b>Figure 1</b>: A view of our dataset of Zurich, Switzerland
|
| 8 |
+
</div>
|
| 9 |
+
|
| 10 |
+
## Abstract:
|
| 11 |
+
|
| 12 |
+
<div align="justify">
|
| 13 |
+
We present the P<sup>3</sup> dataset, a large-scale multimodal benchmark for building vectorization, constructed from aerial LiDAR point clouds, high-resolution aerial imagery, and vectorized 2D building outlines, collected across three continents. The dataset contains over 10 billion LiDAR points with decimeter-level accuracy and RGB images at a ground sampling distance of 25 cm. While many existing datasets primarily focus on the image modality, P<sup>3</sup> offers a complementary perspective by also incorporating dense 3D information. We demonstrate that LiDAR point clouds serve as a robust modality for predicting building polygons, both in hybrid and end-to-end learning frameworks. Moreover, fusing aerial LiDAR and imagery further improves accuracy and geometric quality of predicted polygons. The P<sup>3</sup> dataset is publicly available, along with code and pretrained weights of three state-of-the-art models for building polygon prediction at https://github.com/raphaelsulzer/PixelsPointsPolygons.
|
| 14 |
+
</div>
|
| 15 |
+
|
| 16 |
+
## Highlights
|
| 17 |
+
|
| 18 |
+
- A global, multimodal dataset of aerial images, aerial lidar point clouds and building polygons
|
| 19 |
+
- A library for training and evaluating state-of-the-art deep learning methods on the dataset
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
## Dataset
|
| 23 |
+
|
| 24 |
+
### Download
|
| 25 |
+
|
| 26 |
+
You can download the dataset at [huggingface.co/datasets/rsi/PixelsPointsPolygons](https://huggingface.co/datasets/rsi/PixelsPointsPolygons) .
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
### Overview
|
| 30 |
+
|
| 31 |
+
<div align="left">
|
| 32 |
+
<img src="./worldmap.jpg" width=60% height=50%>
|
| 33 |
+
</div>
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
<!-- ### Prepare custom tile size
|
| 37 |
+
|
| 38 |
+
See [datasets preprocessing](data_preprocess) for instructions on preparing a dataset with different tile sizes. -->
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
## Code
|
| 42 |
+
|
| 43 |
+
### Download
|
| 44 |
+
|
| 45 |
+
```
|
| 46 |
+
git clone https://github.com/raphaelsulzer/PixelsPointsPolygons
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
### Requirements
|
| 50 |
+
|
| 51 |
+
To create a conda environment named `ppp` and install the repository as a python package with all dependencies run
|
| 52 |
+
```
|
| 53 |
+
bash install.sh
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
or, if you want to manage the environment yourself run
|
| 57 |
+
```
|
| 58 |
+
pip install -r requirements-torch-cuda.txt
|
| 59 |
+
pip install .
|
| 60 |
+
```
|
| 61 |
+
⚠️ **Warning**: The implementation of the LiDAR point cloud encoder uses Open3D-ML. Currently, Open3D-ML officially only supports the PyTorch version specified in `requirements-torch-cuda.txt`.
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
<!-- ## Model Zoo
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
| Model | \<model> | Encoder | \<encoder> |Image |LiDAR | IoU | C-IoU |
|
| 69 |
+
|--------------- |---- |--------------- |--------------- |--- |--- |----- |----- |
|
| 70 |
+
| Frame Field Learning |\<ffl> | Vision Transformer (ViT) | \<vit_cnn> | ✅ | | 0.85 | 0.90 |
|
| 71 |
+
| Frame Field Learning |\<ffl> | PointPillars (PP) + ViT | \<pp_vit_cnn> | | ✅ | 0.80 | 0.88 |
|
| 72 |
+
| Frame Field Learning |\<ffl> | PP+ViT \& ViT | \<fusion_vit_cnn> | ✅ |✅ | 0.78 | 0.85 |
|
| 73 |
+
| HiSup |\<hisup> | Vision Transformer (ViT) | \<vit_cnn> | ✅ | | 0.85 | 0.90 |
|
| 74 |
+
| HiSup |\<hisup> | PointPillars (PP) + ViT | \<pp_vit_cnn> | | ✅ | 0.80 | 0.88 |
|
| 75 |
+
| HiSup |\<hisup> | PP+ViT \& ViT | \<fusion_vit> | ✅ |✅ | 0.78 | 0.85 |
|
| 76 |
+
| Pix2Poly |\<pix2poly>| Vision Transformer (ViT) | \<vit> | ✅ | | 0.85 | 0.90 |
|
| 77 |
+
| Pix2Poly |\<pix2poly>| PointPillars (PP) + ViT | \<pp_vit> | | ✅ | 0.80 | 0.88 |
|
| 78 |
+
| Pix2Poly |\<pix2poly>| PP+ViT \& ViT | \<fusion_vit> | ✅ |✅ | 0.78 | 0.85 | -->
|
| 79 |
+
|
| 80 |
+
### Configuration
|
| 81 |
+
|
| 82 |
+
The project supports hydra configuration which allows to modify any parameter from the command line, such as the model and encoder types from the table above.
|
| 83 |
+
To view all available options run
|
| 84 |
+
```
|
| 85 |
+
python train.py --help
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
### Training
|
| 89 |
+
|
| 90 |
+
Start training with the following command:
|
| 91 |
+
|
| 92 |
+
```
|
| 93 |
+
torchrun --nproc_per_node=<num GPUs> train.py model=<model> encoder=<encoder> model.batch_size=<batch size> ...
|
| 94 |
+
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### Prediction
|
| 98 |
+
|
| 99 |
+
```
|
| 100 |
+
torchrun --nproc_per_node=<num GPUs> predict.py model=<model> checkpoint=best_val_iou ...
|
| 101 |
+
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
### Evaluation
|
| 105 |
+
|
| 106 |
+
```
|
| 107 |
+
python evaluate.py model=<model> checkpoint=best_val_iou
|
| 108 |
+
```
|
| 109 |
+
<!-- ## Trained models
|
| 110 |
+
|
| 111 |
+
asd -->
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
<!-- ## Results
|
| 115 |
+
|
| 116 |
+
#TODO Put paper main results table here -->
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
## Citation
|
| 120 |
+
|
| 121 |
+
If you find our work useful, please consider citing:
|
| 122 |
+
```bibtex
|
| 123 |
+
...
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
## Acknowledgements
|
| 127 |
+
|
| 128 |
+
This repository benefits from the following open-source work. We thank the authors for their great work.
|
| 129 |
+
|
| 130 |
+
1. [Frame Field Learning](https://github.com/Lydorn/Polygonization-by-Frame-Field-Learning)
|
| 131 |
+
2. [HiSup](https://github.com/SarahwXU/HiSup)
|
| 132 |
+
3. [Pix2Poly](https://github.com/yeshwanth95/Pix2Poly)
|
hrnetv2_w48_imagenet_pretrained.pth
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:0efec102d97f2ef58f0e258b2c3076b3704b93ffc2b73f64c8da5462c0037ef8
|
| 3 |
-
size 310643500
|
|
|
|
|
|
|
|
|
|
|
|