rsi commited on
Commit
26ae6a7
·
1 Parent(s): a3654c2

update readme

Browse files
Files changed (2) hide show
  1. README.md +132 -3
  2. hrnetv2_w48_imagenet_pretrained.pth +0 -3
README.md CHANGED
@@ -1,3 +1,132 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <h2 align="center">The P<sup>3</sup> dataset: Pixels, Points and Polygons <br> for Multimodal Building Vectorization</h2>
3
+ <h3><align="center">Raphael Sulzer<sup>1,2</sup> &nbsp;&nbsp;&nbsp; Liuyun Duan<sup>1</sup>
4
+ &nbsp;&nbsp;&nbsp; Nicolas Girard<sup>1</sup>&nbsp;&nbsp;&nbsp; Florent Lafarge<sup>2</sup></a></h3>
5
+ <align="center"><sup>1</sup>LuxCarta Technology <br> <sup>2</sup>Centre Inria d'Université Côte d'Azur
6
+ <img src="./teaser.jpg" width=100% height=100%>
7
+ <b>Figure 1</b>: A view of our dataset of Zurich, Switzerland
8
+ </div>
9
+
10
+ ## Abstract:
11
+
12
+ <div align="justify">
13
+ We present the P<sup>3</sup> dataset, a large-scale multimodal benchmark for building vectorization, constructed from aerial LiDAR point clouds, high-resolution aerial imagery, and vectorized 2D building outlines, collected across three continents. The dataset contains over 10 billion LiDAR points with decimeter-level accuracy and RGB images at a ground sampling distance of 25 cm. While many existing datasets primarily focus on the image modality, P<sup>3</sup> offers a complementary perspective by also incorporating dense 3D information. We demonstrate that LiDAR point clouds serve as a robust modality for predicting building polygons, both in hybrid and end-to-end learning frameworks. Moreover, fusing aerial LiDAR and imagery further improves accuracy and geometric quality of predicted polygons. The P<sup>3</sup> dataset is publicly available, along with code and pretrained weights of three state-of-the-art models for building polygon prediction at https://github.com/raphaelsulzer/PixelsPointsPolygons.
14
+ </div>
15
+
16
+ ## Highlights
17
+
18
+ - A global, multimodal dataset of aerial images, aerial lidar point clouds and building polygons
19
+ - A library for training and evaluating state-of-the-art deep learning methods on the dataset
20
+
21
+
22
+ ## Dataset
23
+
24
+ ### Download
25
+
26
+ You can download the dataset at [huggingface.co/datasets/rsi/PixelsPointsPolygons](https://huggingface.co/datasets/rsi/PixelsPointsPolygons) .
27
+
28
+
29
+ ### Overview
30
+
31
+ <div align="left">
32
+ <img src="./worldmap.jpg" width=60% height=50%>
33
+ </div>
34
+
35
+
36
+ <!-- ### Prepare custom tile size
37
+
38
+ See [datasets preprocessing](data_preprocess) for instructions on preparing a dataset with different tile sizes. -->
39
+
40
+
41
+ ## Code
42
+
43
+ ### Download
44
+
45
+ ```
46
+ git clone https://github.com/raphaelsulzer/PixelsPointsPolygons
47
+ ```
48
+
49
+ ### Requirements
50
+
51
+ To create a conda environment named `ppp` and install the repository as a python package with all dependencies run
52
+ ```
53
+ bash install.sh
54
+ ```
55
+
56
+ or, if you want to manage the environment yourself run
57
+ ```
58
+ pip install -r requirements-torch-cuda.txt
59
+ pip install .
60
+ ```
61
+ ⚠️ **Warning**: The implementation of the LiDAR point cloud encoder uses Open3D-ML. Currently, Open3D-ML officially only supports the PyTorch version specified in `requirements-torch-cuda.txt`.
62
+
63
+
64
+
65
+ <!-- ## Model Zoo
66
+
67
+
68
+ | Model | \<model> | Encoder | \<encoder> |Image |LiDAR | IoU | C-IoU |
69
+ |--------------- |---- |--------------- |--------------- |--- |--- |----- |----- |
70
+ | Frame Field Learning |\<ffl> | Vision Transformer (ViT) | \<vit_cnn> | ✅ | | 0.85 | 0.90 |
71
+ | Frame Field Learning |\<ffl> | PointPillars (PP) + ViT | \<pp_vit_cnn> | | ✅ | 0.80 | 0.88 |
72
+ | Frame Field Learning |\<ffl> | PP+ViT \& ViT | \<fusion_vit_cnn> | ✅ |✅ | 0.78 | 0.85 |
73
+ | HiSup |\<hisup> | Vision Transformer (ViT) | \<vit_cnn> | ✅ | | 0.85 | 0.90 |
74
+ | HiSup |\<hisup> | PointPillars (PP) + ViT | \<pp_vit_cnn> | | ✅ | 0.80 | 0.88 |
75
+ | HiSup |\<hisup> | PP+ViT \& ViT | \<fusion_vit> | ✅ |✅ | 0.78 | 0.85 |
76
+ | Pix2Poly |\<pix2poly>| Vision Transformer (ViT) | \<vit> | ✅ | | 0.85 | 0.90 |
77
+ | Pix2Poly |\<pix2poly>| PointPillars (PP) + ViT | \<pp_vit> | | ✅ | 0.80 | 0.88 |
78
+ | Pix2Poly |\<pix2poly>| PP+ViT \& ViT | \<fusion_vit> | ✅ |✅ | 0.78 | 0.85 | -->
79
+
80
+ ### Configuration
81
+
82
+ The project supports hydra configuration which allows to modify any parameter from the command line, such as the model and encoder types from the table above.
83
+ To view all available options run
84
+ ```
85
+ python train.py --help
86
+ ```
87
+
88
+ ### Training
89
+
90
+ Start training with the following command:
91
+
92
+ ```
93
+ torchrun --nproc_per_node=<num GPUs> train.py model=<model> encoder=<encoder> model.batch_size=<batch size> ...
94
+
95
+ ```
96
+
97
+ ### Prediction
98
+
99
+ ```
100
+ torchrun --nproc_per_node=<num GPUs> predict.py model=<model> checkpoint=best_val_iou ...
101
+
102
+ ```
103
+
104
+ ### Evaluation
105
+
106
+ ```
107
+ python evaluate.py model=<model> checkpoint=best_val_iou
108
+ ```
109
+ <!-- ## Trained models
110
+
111
+ asd -->
112
+
113
+
114
+ <!-- ## Results
115
+
116
+ #TODO Put paper main results table here -->
117
+
118
+
119
+ ## Citation
120
+
121
+ If you find our work useful, please consider citing:
122
+ ```bibtex
123
+ ...
124
+ ```
125
+
126
+ ## Acknowledgements
127
+
128
+ This repository benefits from the following open-source work. We thank the authors for their great work.
129
+
130
+ 1. [Frame Field Learning](https://github.com/Lydorn/Polygonization-by-Frame-Field-Learning)
131
+ 2. [HiSup](https://github.com/SarahwXU/HiSup)
132
+ 3. [Pix2Poly](https://github.com/yeshwanth95/Pix2Poly)
hrnetv2_w48_imagenet_pretrained.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0efec102d97f2ef58f0e258b2c3076b3704b93ffc2b73f64c8da5462c0037ef8
3
- size 310643500