zzuxzt commited on
Commit
d9c5371
·
verified ·
1 Parent(s): d3caa51

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ demo/1.lobby_s3net_segmentation.gif filter=lfs diff=lfs merge=lfs -text
37
+ demo/2.lobby_semantic_mapping.gif filter=lfs diff=lfs merge=lfs -text
38
+ demo/3.lobby_semantic_navigation.gif filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Temple Robotics and Artificial Intelligence Lab
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,3 +1,239 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone
2
+
3
+ S³-Net implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899).
4
+ Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY).
5
+ The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.
6
+
7
+ ## Related Resources
8
+
9
+ - **Dataset Download:** https://doi.org/10.5281/zenodo.18350696
10
+ - **SALSA (Dataset and Labeling Framework):** https://github.com/TempleRAIL/semantic2d
11
+ - **S³-Net (Stochastic Semantic Segmentation):** https://github.com/TempleRAIL/s3_net
12
+ - **Semantic CNN Navigation:** https://github.com/TempleRAIL/semantic_cnn_nav
13
+
14
+ ## S³-Net: Stochastic Semantic Segmentation Network
15
+
16
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
17
+
18
+ S³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point.
19
+
20
+ ## Demo Results
21
+
22
+ **S³-Net Segmentation**
23
+ ![S³-Net Segmentation](./demo/1.lobby_s3net_segmentation.gif)
24
+
25
+ **Semantic Mapping**
26
+ ![Semantic Mapping](./demo/2.lobby_semantic_mapping.gif)
27
+
28
+ **Semantic Navigation**
29
+ ![Semantic Navigation](./demo/3.lobby_semantic_navigation.gif)
30
+
31
+ ## Model Architecture
32
+
33
+ S³-Net uses an encoder-decoder architecture with stochastic latent representations:
34
+
35
+ ```
36
+ Input (3 channels: scan, intensity, angle of incidence)
37
+
38
+
39
+ ┌─────────────────────────────────────┐
40
+ │ Encoder (Conv1D + Residual Blocks) │
41
+ │ - Conv1D (3 → 32) stride=2 │
42
+ │ - Conv1D (32 → 64) stride=2 │
43
+ │ - Residual Stack (2 layers) │
44
+ └─────────────────────────────────────┘
45
+
46
+
47
+ ┌─────────────────────────────────────┐
48
+ │ VAE Reparameterization │
49
+ │ - μ (mean) and σ (std) estimation │
50
+ │ - Latent sampling z ~ N(μ, σ²) │
51
+ │ - Monte Carlo KL divergence │
52
+ └─────────────────────────────────────┘
53
+
54
+
55
+ ┌─────────────────────────────────────┐
56
+ │ Decoder (Residual + TransposeConv) │
57
+ │ - Residual Stack (2 layers) │
58
+ │ - TransposeConv1D (64 → 32) │
59
+ │ - TransposeConv1D (32 → 10) │
60
+ │ - Softmax (10 semantic classes) │
61
+ └─────────────────────────────────────┘
62
+
63
+
64
+ Output (10 channels: semantic probabilities)
65
+ ```
66
+
67
+ **Key Features:**
68
+ - **3 Input Channels:** Range scan, intensity, angle of incidence
69
+ - **10 Output Classes:** Background + 9 semantic classes
70
+ - **Stochastic Inference:** Multiple forward passes enable uncertainty estimation via majority voting
71
+ - **Loss Function:** Cross-Entropy + Lovasz-Softmax + β-VAE KL divergence
72
+
73
+ ## Semantic Classes
74
+
75
+ | ID | Class | Description |
76
+ |----|------------|--------------------------------|
77
+ | 0 | Other | Background/unknown |
78
+ | 1 | Chair | Office and lounge chairs |
79
+ | 2 | Door | Doors (open/closed) |
80
+ | 3 | Elevator | Elevator doors |
81
+ | 4 | Person | Dynamic pedestrians |
82
+ | 5 | Pillar | Structural pillars/columns |
83
+ | 6 | Sofa | Sofas and couches |
84
+ | 7 | Table | Tables of all types |
85
+ | 8 | Trash bin | Waste receptacles |
86
+ | 9 | Wall | Walls and flat surfaces |
87
+
88
+ ## Requirements
89
+
90
+ - Python 3.7+
91
+ - PyTorch 1.7.1+
92
+ - TensorBoard
93
+ - NumPy
94
+ - Matplotlib
95
+ - tqdm
96
+
97
+ Install dependencies:
98
+ ```bash
99
+ pip install torch torchvision tensorboardX numpy matplotlib tqdm
100
+ ```
101
+
102
+ ## Dataset Structure
103
+
104
+ S³-Net expects the Semantic2D dataset organized as follows:
105
+
106
+ ```
107
+ ~/semantic2d_data/
108
+ ├── dataset.txt # List of dataset folders
109
+ ├── 2024-04-11-15-24-29/ # Dataset folder 1
110
+ │ ├── train.txt # Training sample list
111
+ │ ├── dev.txt # Validation sample list
112
+ │ ├── scans_lidar/ # Range scans (.npy)
113
+ │ ├── intensities_lidar/ # Intensity data (.npy)
114
+ │ └── semantic_label/ # Ground truth labels (.npy)
115
+ ├── 2024-04-04-12-16-41/ # Dataset folder 2
116
+ │ └── ...
117
+ └── ...
118
+ ```
119
+
120
+ **dataset.txt format:**
121
+ ```
122
+ 2024-04-11-15-24-29
123
+ 2024-04-04-12-16-41
124
+ ```
125
+
126
+ ## Usage
127
+
128
+ ### Training
129
+
130
+ Train S³-Net on your dataset:
131
+
132
+ ```bash
133
+ sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
134
+ ```
135
+
136
+ **Arguments:**
137
+ - `$1` - Training data directory (contains `dataset.txt` and subfolders)
138
+ - `$2` - Validation data directory
139
+
140
+ **Training Configuration** (in `scripts/train.py`):
141
+
142
+ | Parameter | Default | Description |
143
+ |-----------|---------|-------------|
144
+ | `NUM_EPOCHS` | 20000 | Total training epochs |
145
+ | `BATCH_SIZE` | 1024 | Samples per batch |
146
+ | `LEARNING_RATE` | 0.001 | Initial learning rate |
147
+ | `BETA` | 0.01 | β-VAE weight for KL divergence |
148
+
149
+ **Learning Rate Schedule:**
150
+ - Epochs 0-50000: `1e-4`
151
+ - Epochs 50000-480000: `2e-5`
152
+ - Epochs 480000+: Exponential decay
153
+
154
+ The model saves checkpoints every 2000 epochs to `./model/`.
155
+
156
+ ### Inference Demo
157
+
158
+ Run semantic segmentation on test data:
159
+
160
+ ```bash
161
+ sh run_eval_demo.sh ~/semantic2d_data/
162
+ ```
163
+
164
+ **Arguments:**
165
+ - `$1` - Test data directory (reads `dev.txt` for sample list)
166
+
167
+ **Output:**
168
+ - `./output/semantic_ground_truth_*.png` - Ground truth visualizations
169
+ - `./output/semantic_s3net_*.png` - S³-Net predictions
170
+
171
+ **Example Output:**
172
+
173
+ | Ground Truth | S³-Net Prediction |
174
+ |:------------:|:-----------------:|
175
+ | ![Ground Truth](./output/semantic_ground_truth_7000.png) | ![S³-Net Prediction](./output/semantic_s3net_7000.png) |
176
+
177
+ ### Stochastic Inference
178
+
179
+ S³-Net performs **32 stochastic forward passes** per sample and uses **majority voting** to determine the final prediction. This provides:
180
+ - More robust predictions
181
+ - Implicit uncertainty estimation
182
+ - Reduced noise in segmentation boundaries
183
+
184
+ ## File Structure
185
+
186
+ ```
187
+ s3_net/
188
+ ├── demo/ # Demo GIFs
189
+ │ ├── 1.lobby_s3net_segmentation.gif
190
+ │ ├── 2.lobby_semantic_mapping.gif
191
+ │ └── 3.lobby_semantic_navigation.gif
192
+ ├── model/
193
+ │ └── s3_net_model.pth # Pretrained model weights
194
+ ├── output/ # Inference output directory
195
+ ├── scripts/
196
+ │ ├── model.py # S³-Net model architecture
197
+ │ ├── train.py # Training script
198
+ │ ├── decode_demo.py # Inference/demo script
199
+ │ └── lovasz_losses.py # Lovasz-Softmax loss function
200
+ ├── run_train.sh # Training driver script
201
+ ├── run_eval_demo.sh # Inference driver script
202
+ ├── LICENSE # MIT License
203
+ └── README.md # This file
204
+ ```
205
+
206
+
207
+ ## TensorBoard Monitoring
208
+
209
+ Training logs are saved to `./runs/`. View training progress:
210
+
211
+ ```bash
212
+ tensorboard --logdir=runs
213
+ ```
214
+
215
+ Monitored metrics:
216
+ - Training/Validation loss
217
+ - Cross-Entropy loss
218
+ - Lovasz-Softmax loss
219
+
220
+ ## Pre-trained Model
221
+
222
+ A pre-trained model is included at `model/s3_net_model.pth`. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor.
223
+
224
+ To use the pre-trained model:
225
+ ```bash
226
+ sh run_eval_demo.sh ~/semantic2d_data/
227
+ ```
228
+
229
+
230
+ ## Citation
231
+
232
+ ```bibtex
233
+ @article{xie2026semantic2d,
234
+ title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
235
+ author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
236
+ journal={arXiv preprint arXiv:2409.09899},
237
+ year={2026}
238
+ }
239
+ ```
demo/1.lobby_s3net_segmentation.gif ADDED

Git LFS Details

  • SHA256: b9d77f9bb9a88f57be8888e934c69cfc0a5b79edc14dd69ddee152c5a6ecc3fc
  • Pointer size: 133 Bytes
  • Size of remote file: 12.4 MB
demo/2.lobby_semantic_mapping.gif ADDED

Git LFS Details

  • SHA256: cf0af6410a4c25971390639ab0bd3466de629af0c2be99abccaa2d353fe12251
  • Pointer size: 132 Bytes
  • Size of remote file: 1.45 MB
demo/3.lobby_semantic_navigation.gif ADDED

Git LFS Details

  • SHA256: e26fb5c2fb52941f4e33c7b4b8e7f116d3e3689d2f9c40d7fe288972b5af48f5
  • Pointer size: 133 Bytes
  • Size of remote file: 13.2 MB
model/s3_net_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86ffcba0092e8e20d80fc02e5e01bb675c60d0c897d8830305ecc5b8b20b6dbb
3
+ size 741507
output/semantic_ground_truth_7000.png ADDED
output/semantic_s3net_7000.png ADDED
run_eval_demo.sh ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/sh
2
+ #
3
+ # file: run_demo.sh
4
+ #
5
+ # This is a simple driver script that runs training and then decoding
6
+ # on the training set and the val set.
7
+ #
8
+ # To run this script, execute the following line:
9
+ #
10
+ # run_demo.sh train.dat val.dat
11
+ #
12
+ # The first argument ($1) is the training data. The last two arguments,
13
+ # test data ($2) and evaluation data ($3) are optional.
14
+ #
15
+ # An example of how to run this is as follows:
16
+ #
17
+ # xzt: echo $PWD
18
+ # /home/xzt/SOGMP
19
+ # xzt: sh run_demo.sh ~/semantic2d_data/
20
+ #
21
+
22
+ # decode the number of command line arguments
23
+ #
24
+ NARGS=$#
25
+
26
+ if (test "$NARGS" -eq "0") then
27
+ echo "usage: run_demo.sh test.dat"
28
+ exit 1
29
+ fi
30
+
31
+ # define a base directory for the experiment
32
+ #
33
+ DL_EXP=`pwd`;
34
+ DL_SCRIPTS="$DL_EXP/scripts";
35
+ DL_OUT="$DL_EXP/output";
36
+
37
+ # define the output directories for training/decoding/scoring
38
+ #
39
+ #DL_TRAIN_ODIR="$DL_OUT/00_train";
40
+ DL_TRAIN_ODIR="$DL_EXP/model";
41
+ DL_MDL_PATH="$DL_TRAIN_ODIR/s3_net_model.pth";
42
+
43
+ # evaluate each data set that was specified
44
+ #
45
+ echo "... starting evaluation of $1..."
46
+ $DL_SCRIPTS/decode_demo.py $DL_MDL_PATH $1 | \
47
+ tee $DL_OUT/01_decode_dev.log | grep "00 out of\|Average"
48
+ echo "... finished evaluation of $1 ..."
49
+
50
+
51
+ echo "======= end of results ======="
52
+
53
+ #
54
+ # exit gracefully
run_train.sh ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/sh
2
+ #
3
+ # file: run_train.sh
4
+ #
5
+ # This is a simple driver script that runs training and then decoding
6
+ # on the training set and the val set.
7
+ #
8
+ # To run this script, execute the following line:
9
+ #
10
+ # run_train.sh train.dat val.dat
11
+ #
12
+ # The first argument ($1) is the training data. The last two arguments,
13
+ # test data ($2) and evaluation data ($3) are optional.
14
+ #
15
+ # An example of how to run this is as follows:
16
+ #
17
+ # xzt: echo $PWD
18
+ # /home/xzt/SOGMP
19
+ # xzt: sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
20
+ #
21
+
22
+ # decode the number of command line arguments
23
+ #
24
+ NARGS=$#
25
+
26
+ if (test "$NARGS" -eq "0") then
27
+ echo "usage: run_train.sh train.dat val.dat"
28
+ exit 1
29
+ fi
30
+
31
+ # define a base directory for the experiment
32
+ #
33
+ DL_EXP=`pwd`;
34
+ DL_SCRIPTS="$DL_EXP/scripts";
35
+ DL_OUT="$DL_EXP/output";
36
+
37
+ # define the number of feats environment variable
38
+ #
39
+ export DL_NUM_FEATS=3
40
+
41
+ # define the output directories for training/decoding/scoring
42
+ #
43
+ #DL_TRAIN_ODIR="$DL_OUT/00_train";
44
+ DL_TRAIN_ODIR="$DL_EXP/model";
45
+ DL_MDL_PATH="$DL_TRAIN_ODIR/model.pth";
46
+
47
+ # create the output directory
48
+ #
49
+ rm -fr $DL_OUT
50
+ mkdir -p $DL_OUT
51
+
52
+ # execute training: training must always be run
53
+ #
54
+ echo "... starting training on $1 ..."
55
+ $DL_SCRIPTS/train.py $DL_MDL_PATH $1 $2 | tee $DL_OUT/00_train.log | \
56
+ grep "reading\|Step\|Average\|Warning\|Error"
57
+ echo "... finished training on $1 ..."
58
+
59
+ #
scripts/__pycache__/convlstm.cpython-37.pyc ADDED
Binary file (5.75 kB). View file
 
scripts/__pycache__/lovasz_losses.cpython-37.pyc ADDED
Binary file (2.32 kB). View file
 
scripts/__pycache__/model.cpython-37.pyc ADDED
Binary file (9.34 kB). View file
 
scripts/decode_demo.py ADDED
@@ -0,0 +1,244 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ #
3
+ # file: $ISIP_EXP/tuh_dpath/exp_0074/scripts/decode.py
4
+ #
5
+ # revision history:
6
+ # 20190925 (TE): first version
7
+ #
8
+ # usage:
9
+ # python decode.py odir mfile data
10
+ #
11
+ # arguments:
12
+ # odir: the directory where the hypotheses will be stored
13
+ # mfile: input model file
14
+ # data: the input data list to be decoded
15
+ #
16
+ # This script decodes data using a simple MLP model.
17
+ #------------------------------------------------------------------------------
18
+
19
+ # import pytorch modules
20
+ #
21
+ import torch
22
+ import torch.nn as nn
23
+ from tqdm import tqdm
24
+
25
+ # visualize:
26
+ import matplotlib.pyplot as plt
27
+ import numpy as np
28
+
29
+
30
+ import matplotlib
31
+ matplotlib.style.use('ggplot')
32
+
33
+ # import the model and all of its variables/functions
34
+ #
35
+ from model import *
36
+ # import modules
37
+ #
38
+ import sys
39
+ import os
40
+
41
+
42
+
43
+ #-----------------------------------------------------------------------------
44
+ #
45
+ # global variables are listed here
46
+ #
47
+ #-----------------------------------------------------------------------------
48
+
49
+ # general global values
50
+ #
51
+ NUM_ARGS = 3
52
+ SPACE = " "
53
+
54
+ # Constants
55
+ POINTS = 1081
56
+ NUM_CLASSES = 9
57
+ NUM_INPUT_CHANNELS = 1
58
+ NUM_OUTPUT_CHANNELS = NUM_CLASSES
59
+
60
+ # Hokuyo UTM-30LX-EW:
61
+ POINTS = 1081 # the number of lidar points
62
+ AGNLE_MIN = -2.356194496154785
63
+ AGNLE_MAX = 2.356194496154785
64
+ RANGE_MAX = 60.0
65
+
66
+ # for reproducibility, we seed the rng
67
+ #
68
+ set_seed(SEED1)
69
+
70
+
71
+ #------------------------------------------------------------------------------
72
+ #
73
+ # the main program starts here
74
+ #
75
+ #------------------------------------------------------------------------------
76
+
77
+ # function: main
78
+ #
79
+ # arguments: none
80
+ #
81
+ # return: none
82
+ #
83
+ # This method is the main function.
84
+ #
85
+ def main(argv):
86
+ # ensure we have the correct number of arguments:
87
+ if(len(argv) != NUM_ARGS):
88
+ print("usage: python nedc_decode_mdl.py [ODIR] [MDL_PATH] [EVAL_SET]")
89
+ exit(-1)
90
+
91
+ # define local variables:
92
+ odir = argv[0]
93
+ mdl_path = argv[1]
94
+ fImg = argv[2]
95
+
96
+ # if the odir doesn't exist, we make it:
97
+ if not os.path.exists(odir):
98
+ os.makedirs(odir)
99
+
100
+
101
+ # set the device to use GPU if available:
102
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
103
+
104
+ # get array of the data
105
+ # data: [[0, 1, ... 26], [27, 28, ...] ...]
106
+ # labels: [0, 0, 1, ...]
107
+ #
108
+ #[ped_pos_e, scan_e, goal_e, vel_e] = get_data(fname)
109
+ eval_dataset = VaeTestDataset(fImg,'dev')
110
+ eval_dataloader = torch.utils.data.DataLoader(eval_dataset, batch_size=1, \
111
+ shuffle=False, drop_last=True) #, pin_memory=True)
112
+
113
+ # instantiate a model:
114
+ model = S3Net(input_channels=NUM_INPUT_CHANNELS,
115
+ output_channels=NUM_OUTPUT_CHANNELS)
116
+ # moves the model to device (cpu in our case so no change):
117
+ model.to(device)
118
+
119
+ # set the model to evaluate
120
+ #
121
+ model.eval()
122
+
123
+ # set the loss criterion:
124
+ criterion = nn.MSELoss(reduction='sum') #, weight=class_weights)
125
+ criterion.to(device)
126
+
127
+ # load the weights
128
+ #
129
+ checkpoint = torch.load(mdl_path, map_location=device)
130
+ model.load_state_dict(checkpoint['model'])
131
+
132
+ # for each batch in increments of batch size:
133
+ counter = 0
134
+ num_samples = 32
135
+ # get the number of batches (ceiling of train_data/batch_size):
136
+ num_batches = int(len(eval_dataset)/eval_dataloader.batch_size)
137
+ with torch.no_grad():
138
+ for i, batch in tqdm(enumerate(eval_dataloader), total=num_batches):
139
+ #for i, batch in enumerate(dataloader, 0):
140
+ if(i % 100 == 0):
141
+ counter += 1
142
+ # collect the samples as a batch:
143
+ scans = batch['scan']
144
+ scans = scans.to(device)
145
+ intensities = batch['intensity']
146
+ intensities = intensities.to(device)
147
+ angle_incidence = batch['angle_incidence']
148
+ angle_incidence = angle_incidence.to(device)
149
+ labels = batch['label']
150
+ labels = labels.to(device)
151
+
152
+ # feed the batch to the network:
153
+ inputs_samples = scans.repeat(num_samples,1,1)
154
+ intensity_samples = intensities.repeat(num_samples,1,1)
155
+ angle_incidence_samples = angle_incidence.repeat(num_samples,1,1)
156
+
157
+ # feed the batch to the network:
158
+ semantic_scan, semantic_channels, kl_loss = model(inputs_samples, intensity_samples, angle_incidence_samples)
159
+
160
+ semantic_scans = semantic_scan.cpu().detach().numpy()
161
+ semantic_scans_mx = semantic_scans.argmax(axis=1)
162
+
163
+ # majority vote:
164
+ semantic_scans_mx_mean = semantic_scans_mx.mode(0).values
165
+
166
+ # plot:
167
+ r = scans.cpu().detach().numpy().reshape(POINTS)
168
+ theta = np.linspace(AGNLE_MIN, AGNLE_MAX, num=POINTS, endpoint='true')
169
+
170
+ ## plot semantic label:
171
+ fig = plt.figure(figsize=(12, 12))
172
+ ax = fig.add_subplot(1,1,1, projection='polar', facecolor='seashell')
173
+ smap = labels.reshape(POINTS)
174
+
175
+ # add the background label:
176
+ theta = np.insert(theta, -1, np.pi)
177
+ r = np.insert(r, -1, 1)
178
+ smap = np.insert(smap, -1, 0)
179
+ label_val = np.unique(smap).astype(int)
180
+
181
+ colors = smap
182
+ area = 6
183
+ scatter = ax.scatter(theta, r, c=colors, s=area, cmap='nipy_spectral', alpha=0.95, linewidth=10)
184
+ ax.set_xticks(np.linspace(AGNLE_MIN, AGNLE_MAX, 8, endpoint='true'))
185
+ ax.set_thetamin(-135)
186
+ ax.set_thetamax(135)
187
+ ax.set_yticklabels([])
188
+ # produce a legend with the unique colors from the scatter
189
+ classes = ['Other', 'Chair', 'Door', 'Elevator', 'Person', 'Pillar', 'Sofa', 'Table', 'Trash bin', 'Wall']
190
+ plt.xticks(fontsize=16)
191
+ plt.yticks(fontsize=16)
192
+ plt.legend(handles=scatter.legend_elements(num=[j for j in label_val])[0], labels=[classes[j] for j in label_val], bbox_to_anchor=(0.5, -0.08), loc='lower center', fontsize=18)
193
+ ax.grid(False)
194
+ ax.set_theta_offset(np.pi/2)
195
+
196
+ input_img_name = "./output/semantic_ground_truth_" + str(i)+ ".jpg"
197
+ plt.savefig(input_img_name, bbox_inches='tight')
198
+ #plt.show()
199
+
200
+ ## plot s3-net semantic seg,ementation:
201
+ fig = plt.figure(figsize=(12, 12))
202
+ ax = fig.add_subplot(1,1,1, projection='polar', facecolor='seashell')
203
+ smap = semantic_scans_mx_mean.reshape(POINTS)
204
+
205
+ # add the background label:
206
+ theta = np.insert(theta, -1, np.pi)
207
+ r = np.insert(r, -1, 1)
208
+ smap = np.insert(smap, -1, 0)
209
+ label_val = np.unique(smap).astype(int)
210
+
211
+ colors = smap
212
+ area = 6
213
+ scatter = ax.scatter(theta, r, c=colors, s=area, cmap='nipy_spectral', alpha=0.95, linewidth=10)
214
+ ax.set_xticks(np.linspace(AGNLE_MIN, AGNLE_MAX, 8, endpoint='true'))
215
+ ax.set_thetamin(-135)
216
+ ax.set_thetamax(135)
217
+ ax.set_yticklabels([])
218
+ # produce a legend with the unique colors from the scatter
219
+ classes = ['Other', 'Chair', 'Door', 'Elevator', 'Person', 'Pillar', 'Sofa', 'Table', 'Trash bin', 'Wall']
220
+ plt.xticks(fontsize=16)
221
+ plt.yticks(fontsize=16)
222
+ plt.legend(handles=scatter.legend_elements(num=[j for j in label_val])[0], labels=[classes[j] for j in label_val], bbox_to_anchor=(0.5, -0.08), loc='lower center', fontsize=18)
223
+ ax.grid(False)
224
+ ax.set_theta_offset(np.pi/2)
225
+
226
+ input_img_name = "./output/semantic_s3net_" + str(i)+ ".jpg"
227
+ plt.savefig(input_img_name, bbox_inches='tight')
228
+
229
+ print(i)
230
+
231
+
232
+ # exit gracefully
233
+ #
234
+ return True
235
+ #
236
+ # end of function
237
+
238
+
239
+ # begin gracefully
240
+ #
241
+ if __name__ == '__main__':
242
+ main(sys.argv[1:])
243
+ #
244
+ # end of file
scripts/lovasz_losses.py ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/python
2
+ # -*- encoding: utf-8 -*-
3
+ #!/usr/bin/env python
4
+ #
5
+ # file: $ISIP_EXP/SOGMP/scripts/model.py
6
+ #
7
+ # revision history: xzt
8
+ # 20220824 (TE): first version
9
+ #
10
+ # usage:
11
+ #
12
+ # This script hold the loss fucntions for the Lovasz-Softmax loss.
13
+
14
+ import torch
15
+ import torch.nn as nn
16
+ import torch.nn.functional as F
17
+ import torch.cuda.amp as amp
18
+
19
+ # grads = {}
20
+
21
+ ##
22
+ # version 1: use torch.autograd
23
+ class LovaszSoftmax(nn.Module):
24
+ '''
25
+ This is the autograd version, used in the multi-category classification case
26
+ '''
27
+ def __init__(self, reduction='mean', ignore_index=-100):
28
+ super(LovaszSoftmax, self).__init__()
29
+ self.reduction = reduction
30
+ self.lb_ignore = ignore_index
31
+
32
+ def forward(self, logits, label):
33
+ '''
34
+ Same usage method as nn.CrossEntropyLoss:
35
+ >>> criteria = LovaszSoftmax()
36
+ >>> logits = torch.randn(8, 19, 384, 384) # nchw, float/half
37
+ >>> lbs = torch.randint(0, 19, (8, 384, 384)) # nhw, int64_t
38
+ >>> loss = criteria(logits, lbs)
39
+ '''
40
+ # overcome ignored label
41
+ n, c, h = logits.size()
42
+ logits = logits.transpose(0, 1).reshape(c, -1).float() # use fp32 to avoid nan
43
+ label = label.view(-1)
44
+
45
+ idx = label.ne(self.lb_ignore).nonzero(as_tuple=False).squeeze()
46
+ probs = logits.softmax(dim=0)[:, idx]
47
+
48
+ label = label[idx]
49
+ lb_one_hot = torch.zeros_like(probs).scatter_(
50
+ 0, label.unsqueeze(0), 1).detach()
51
+
52
+ errs = (lb_one_hot - probs).abs()
53
+ errs_sort, errs_order = torch.sort(errs, dim=1, descending=True)
54
+ n_samples = errs.size(1)
55
+
56
+ # lovasz extension grad
57
+ with torch.no_grad():
58
+ # lb_one_hot_sort = lb_one_hot[
59
+ # torch.arange(c).unsqueeze(1).repeat(1, n_samples), errs_order
60
+ # ].detach()
61
+ lb_one_hot_sort = torch.cat([
62
+ lb_one_hot[i, ord].unsqueeze(0)
63
+ for i, ord in enumerate(errs_order)], dim=0)
64
+ n_pos = lb_one_hot_sort.sum(dim=1, keepdim=True)
65
+ inter = n_pos - lb_one_hot_sort.cumsum(dim=1)
66
+ union = n_pos + (1. - lb_one_hot_sort).cumsum(dim=1)
67
+ jacc = 1. - inter / union
68
+ if n_samples > 1:
69
+ jacc[:, 1:] = jacc[:, 1:] - jacc[:, :-1]
70
+
71
+ losses = torch.einsum('ab,ab->a', errs_sort, jacc)
72
+
73
+ if self.reduction == 'sum':
74
+ losses = losses.sum()
75
+ elif self.reduction == 'mean':
76
+ losses = losses.mean()
77
+ return losses, errs
scripts/model.py ADDED
@@ -0,0 +1,469 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ #
3
+ # file: $ISIP_EXP/SOGMP/scripts/model.py
4
+ #
5
+ # revision history: xzt
6
+ # 20220824 (TE): first version
7
+ #
8
+ # usage:
9
+ #
10
+ # This script hold the model architecture
11
+ #------------------------------------------------------------------------------
12
+
13
+ # import pytorch modules
14
+ #
15
+ from __future__ import print_function
16
+ import torch
17
+ import torch.nn as nn
18
+ import torch.nn.functional as F
19
+ import numpy as np
20
+ from collections import OrderedDict
21
+
22
+ # import modules
23
+ #
24
+ import os
25
+ import random
26
+
27
+ # for reproducibility, we seed the rng
28
+ #
29
+ SEED1 = 1337
30
+ NEW_LINE = "\n"
31
+
32
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
33
+
34
+ #-----------------------------------------------------------------------------
35
+ #
36
+ # helper functions are listed here
37
+ #
38
+ #-----------------------------------------------------------------------------
39
+
40
+ # function: set_seed
41
+ #
42
+ # arguments: seed - the seed for all the rng
43
+ #
44
+ # returns: none
45
+ #
46
+ # this method seeds all the random number generators and makes
47
+ # the results deterministic
48
+ #
49
+ def set_seed(seed):
50
+ #torch.manual_seed(seed)
51
+ #torch.cuda.manual_seed_all(seed)
52
+ torch.backends.cudnn.deterministic = True
53
+ torch.backends.cudnn.benchmark = False
54
+ #random.seed(seed)
55
+ #os.environ['PYTHONHASHSEED'] = str(seed)
56
+ #
57
+ # end of method
58
+
59
+ # calculate the angle of incidence of the lidar ray:
60
+ def angle_incidence_calculation(b, c, alpha, last_ray=False):
61
+ '''
62
+ # remove invalid values:
63
+ if(last_ray): # the last ray
64
+ if(np.isnan(b) or np.isinf(b)):
65
+ b = 60.
66
+ if(np.isnan(c) or np.isinf(c)):
67
+ c = 60.
68
+ else:
69
+ b[np.isnan(b)] = 60.
70
+ b[np.isinf(b)] = 60.
71
+ c[np.isnan(c)] = 60.
72
+ c[np.isinf(c)] = 60.
73
+ '''
74
+ # the law of cosines:
75
+ a = np.sqrt(b*b + c*c - 2*b*c*np.cos(alpha))
76
+ if(last_ray): # the last ray
77
+ beta = np.arccos([(a*a + c*c - b*b)/(2*a*c)])
78
+ theta = np.abs(np.pi/2 - beta)
79
+ else:
80
+ gamma = np.arccos([(a*a + b*b - c*c)/(2*a*b)])
81
+ theta = np.abs(np.pi/2 - gamma)
82
+
83
+ return theta
84
+
85
+ # function: get_data
86
+ #
87
+ # arguments: fp - file pointer
88
+ # num_feats - the number of features in a sample
89
+ #
90
+ # returns: data - the signals/features
91
+ # labels - the correct labels for them
92
+ #
93
+ # this method takes in a fp and returns the data and labels
94
+ POINTS = 1081
95
+ class VaeTestDataset(torch.utils.data.Dataset):
96
+ def __init__(self, img_path, file_name):
97
+ # initialize the data and labels
98
+ # read the names of image data:
99
+ self.scan_file_names = []
100
+ self.intensity_file_names = []
101
+ #self.vel_file_names = []
102
+ self.label_file_names = []
103
+ # parameters: data mean std: scan, intensity, angle of incidence:
104
+ # [[4.518406, 8.2914915], [3081.8167, 1529.4413]]
105
+ # [4.518406, 8.2914915], [3081.8167, 1529.4413], [0.5959513, 0.4783924]]
106
+ self.s_mu = 4.518406
107
+ self.s_std = 8.2914915
108
+ self.i_mu = 3081.8167
109
+ self.i_std = 1529.4413
110
+ self.a_mu = 0.5959513
111
+ self.a_std = 0.4783924
112
+ # open train.txt or dev.txt:
113
+ fp_folder = open(img_path+'dataset.txt','r')
114
+
115
+ # for each line of the file:
116
+ for folder_line in fp_folder.read().split(NEW_LINE):
117
+ if('-' in folder_line):
118
+ folder_path = folder_line
119
+ fp_file = open(img_path+folder_path+'/'+file_name+'.txt', 'r')
120
+ for line in fp_file.read().split(NEW_LINE):
121
+ if('.npy' in line):
122
+ self.scan_file_names.append(img_path+folder_path+'/scans_lidar/'+line)
123
+ self.intensity_file_names.append(img_path+folder_path+'/intensities_lidar/'+line)
124
+ #self.vel_file_names.append(img_path+folder_path+'/velocities/'+line)
125
+ self.label_file_names.append(img_path+folder_path+'/semantic_label/'+line)
126
+ # close txt file:
127
+ fp_file.close()
128
+
129
+ # close txt file:
130
+ fp_folder.close()
131
+
132
+ self.length = len(self.scan_file_names)
133
+
134
+ print("dataset length: ", self.length)
135
+
136
+
137
+ def __len__(self):
138
+ return self.length
139
+
140
+ def __getitem__(self, idx):
141
+ # get the index of start point:
142
+ scan = np.zeros((1, POINTS))
143
+ intensity = np.zeros((1, POINTS))
144
+ angle_incidence = np.zeros((1, POINTS))
145
+ label = np.zeros((1, POINTS))
146
+
147
+ # get the scan data:
148
+ intensity_name = self.intensity_file_names[idx]
149
+ intensity = np.load(intensity_name)
150
+
151
+ # get the scan data:
152
+ scan_name = self.scan_file_names[idx]
153
+ scan = np.load(scan_name)
154
+
155
+ # get the semantic label data:
156
+ label_name = self.label_file_names[idx]
157
+ label = np.load(label_name)
158
+
159
+ # get the angle of incidence of the ray:
160
+ b = scan[:-1]
161
+ c = scan[1:]
162
+ alpha = np.ones(POINTS - 1)*((270*np.pi / 180) / (POINTS - 1))
163
+ theta = angle_incidence_calculation(b, c, alpha)
164
+ # last ray:
165
+ b_last = scan[-2]
166
+ c_last = scan[-1]
167
+ alpha_last = (270*np.pi / 180) / (POINTS - 1)
168
+ theta_last = angle_incidence_calculation(b_last, c_last, alpha_last, last_ray=True)
169
+ angle_incidence = np.concatenate((theta[0], theta_last), axis=0)
170
+
171
+ # initialize:
172
+ scan[np.isnan(scan)] = 0.
173
+ scan[np.isinf(scan)] = 0.
174
+
175
+ intensity[np.isnan(intensity)] = 0.
176
+ intensity[np.isinf(intensity)] = 0.
177
+
178
+ angle_incidence[np.isnan(angle_incidence)] = 0.
179
+ angle_incidence[np.isinf(angle_incidence)] = 0.
180
+
181
+ label[np.isnan(label)] = 0.
182
+ label[np.isinf(label)] = 0.
183
+
184
+ # data normalization:
185
+ # standardization: scan
186
+ # mu: 4.518406, std: 8.2914915
187
+ scan = (scan - self.s_mu) / self.s_std
188
+
189
+ # standardization: intensity
190
+ # mu: 3081.8167, std: 1529.4413
191
+ intensity = (intensity - self.i_mu) / self.i_std
192
+
193
+ # standardization: angle_incidence
194
+ # mu: 0.5959513, std: 0.4783924
195
+ angle_incidence = (angle_incidence - self.a_mu) / self.a_std
196
+
197
+ # transfer to pytorch tensor:
198
+ scan_tensor = torch.FloatTensor(scan)
199
+ intensity_tensor = torch.FloatTensor(intensity)
200
+ angle_incidence_tensor = torch.FloatTensor(angle_incidence)
201
+ label_tensor = torch.FloatTensor(label)
202
+
203
+ data = {
204
+ 'scan': scan_tensor,
205
+ 'intensity': intensity_tensor,
206
+ 'angle_incidence': angle_incidence_tensor,
207
+ 'label': label_tensor,
208
+ }
209
+
210
+ return data
211
+
212
+ #
213
+ # end of function
214
+
215
+
216
+ #------------------------------------------------------------------------------
217
+ #
218
+ # the model is defined here
219
+ #
220
+ #------------------------------------------------------------------------------
221
+
222
+ # define the PyTorch VAE model
223
+ #
224
+ # define a VAE
225
+ # Residual blocks:
226
+ class Residual(nn.Module):
227
+ def __init__(self, in_channels, num_hiddens, num_residual_hiddens):
228
+ super(Residual, self).__init__()
229
+ self._block = nn.Sequential(
230
+ nn.ReLU(True),
231
+ nn.Conv1d(in_channels=in_channels,
232
+ out_channels=num_residual_hiddens,
233
+ kernel_size=3, stride=1, padding=1, bias=False),
234
+ nn.BatchNorm1d(num_residual_hiddens),
235
+ nn.ReLU(True),
236
+ nn.Conv1d(in_channels=num_residual_hiddens,
237
+ out_channels=num_hiddens,
238
+ kernel_size=1, stride=1, bias=False),
239
+ nn.BatchNorm1d(num_hiddens)
240
+ )
241
+
242
+ def forward(self, x):
243
+ return x + self._block(x)
244
+
245
+ class ResidualStack(nn.Module):
246
+ def __init__(self, in_channels, num_hiddens, num_residual_layers, num_residual_hiddens):
247
+ super(ResidualStack, self).__init__()
248
+ self._num_residual_layers = num_residual_layers
249
+ self._layers = nn.ModuleList([Residual(in_channels, num_hiddens, num_residual_hiddens)
250
+ for _ in range(self._num_residual_layers)])
251
+
252
+ def forward(self, x):
253
+ for i in range(self._num_residual_layers):
254
+ x = self._layers[i](x)
255
+ return F.relu(x)
256
+
257
+ # Encoder & Decoder Architecture:
258
+ # Encoder:
259
+ class Encoder(nn.Module):
260
+ def __init__(self, in_channels, num_hiddens, num_residual_layers, num_residual_hiddens):
261
+ super(Encoder, self).__init__()
262
+ self._conv_1 = nn.Sequential(*[
263
+ nn.Conv1d(in_channels=in_channels,
264
+ out_channels=num_hiddens//2,
265
+ kernel_size=4,
266
+ stride=2,
267
+ padding=1),
268
+ nn.BatchNorm1d(num_hiddens//2),
269
+ nn.ReLU(True)
270
+ ])
271
+ self._conv_2 = nn.Sequential(*[
272
+ nn.Conv1d(in_channels=num_hiddens//2,
273
+ out_channels=num_hiddens,
274
+ kernel_size=4,
275
+ stride=2,
276
+ padding=1),
277
+ nn.BatchNorm1d(num_hiddens)
278
+ #nn.ReLU(True)
279
+ ])
280
+ self._residual_stack = ResidualStack(in_channels=num_hiddens,
281
+ num_hiddens=num_hiddens,
282
+ num_residual_layers=num_residual_layers,
283
+ num_residual_hiddens=num_residual_hiddens)
284
+
285
+ def forward(self, inputs):
286
+ x = self._conv_1(inputs)
287
+ x = self._conv_2(x)
288
+ x = self._residual_stack(x)
289
+ return x
290
+
291
+ # Decoder:
292
+ class Decoder(nn.Module):
293
+ def __init__(self, out_channels, num_hiddens, num_residual_layers, num_residual_hiddens):
294
+ super(Decoder, self).__init__()
295
+
296
+ self._residual_stack = ResidualStack(in_channels=num_hiddens,
297
+ num_hiddens=num_hiddens,
298
+ num_residual_layers=num_residual_layers,
299
+ num_residual_hiddens=num_residual_hiddens)
300
+
301
+ self._conv_trans_2 = nn.Sequential(*[
302
+ nn.ReLU(True),
303
+ nn.ConvTranspose1d(in_channels=num_hiddens,
304
+ out_channels=num_hiddens//2,
305
+ kernel_size=4,
306
+ stride=2,
307
+ padding=1),
308
+ nn.BatchNorm1d(num_hiddens//2),
309
+ nn.ReLU(True)
310
+ ])
311
+
312
+ self._conv_trans_1 = nn.Sequential(*[
313
+ nn.ConvTranspose1d(in_channels=num_hiddens//2,
314
+ out_channels=num_hiddens//2,
315
+ kernel_size=4,
316
+ stride=2,
317
+ padding=1,
318
+ output_padding=1),
319
+ nn.BatchNorm1d(num_hiddens//2),
320
+ nn.ReLU(True),
321
+ nn.Conv1d(in_channels=num_hiddens//2,
322
+ out_channels=out_channels,
323
+ kernel_size=3,
324
+ stride=1,
325
+ padding=1),
326
+ #nn.Sigmoid()
327
+ ])
328
+
329
+ def forward(self, inputs):
330
+ x = self._residual_stack(inputs)
331
+ x = self._conv_trans_2(x)
332
+ x = self._conv_trans_1(x)
333
+ return x
334
+
335
+ class VAE_Encoder(nn.Module):
336
+ def __init__(self, input_channel, num_hiddens, num_residual_layers, num_residual_hiddens, embedding_dim):
337
+ super(VAE_Encoder, self).__init__()
338
+ # parameters:
339
+ self.input_channels = input_channel
340
+ '''
341
+ # Constants
342
+ num_hiddens = 128 #128
343
+ num_residual_hiddens = 64 #32
344
+ num_residual_layers = 2
345
+ embedding_dim = 2 #64
346
+ '''
347
+
348
+ # encoder:
349
+ in_channels = input_channel
350
+ self._encoder = Encoder(in_channels,
351
+ num_hiddens,
352
+ num_residual_layers,
353
+ num_residual_hiddens)
354
+
355
+ # z latent variable:
356
+ self._encoder_z_mu = nn.Conv1d(in_channels=num_hiddens,
357
+ out_channels=embedding_dim,
358
+ kernel_size=1,
359
+ stride=1)
360
+ self._encoder_z_log_sd = nn.Conv1d(in_channels=num_hiddens,
361
+ out_channels=embedding_dim,
362
+ kernel_size=1,
363
+ stride=1)
364
+
365
+ def forward(self, x):
366
+ # input reshape:
367
+ x = x.reshape(-1, self.input_channels, POINTS)
368
+ # Encoder:
369
+ encoder_out = self._encoder(x)
370
+ # get `mu` and `log_var`:
371
+ z_mu = self._encoder_z_mu(encoder_out)
372
+ z_log_sd = self._encoder_z_log_sd(encoder_out)
373
+ return z_mu, z_log_sd
374
+
375
+ # our proposed model:
376
+ class S3Net(nn.Module):
377
+ def __init__(self, input_channels, output_channels):
378
+ super(S3Net, self).__init__()
379
+ # parameters:
380
+ self.input_channels = input_channels
381
+ self.latent_dim = 270
382
+ self.output_channels = output_channels
383
+
384
+ # Constants
385
+ num_hiddens = 64 #128
386
+ num_residual_hiddens = 32 #64
387
+ num_residual_layers = 2
388
+ embedding_dim = 1 #2
389
+
390
+ # prediction encoder:
391
+ self._encoder = VAE_Encoder(self.input_channels,
392
+ num_hiddens,
393
+ num_residual_layers,
394
+ num_residual_hiddens,
395
+ embedding_dim)
396
+
397
+ # decoder:
398
+ self._decoder_z_mu = nn.ConvTranspose1d(in_channels=embedding_dim,
399
+ out_channels=num_hiddens,
400
+ kernel_size=1,
401
+ stride=1)
402
+ self._decoder = Decoder(self.output_channels,
403
+ num_hiddens,
404
+ num_residual_layers,
405
+ num_residual_hiddens)
406
+
407
+ self.softmax = nn.Softmax(dim=1)
408
+
409
+
410
+
411
+ def vae_reparameterize(self, z_mu, z_log_sd):
412
+ """
413
+ :param mu: mean from the encoder's latent space
414
+ :param log_sd: log standard deviation from the encoder's latent space
415
+ :output: reparameterized latent variable z, Monte carlo KL divergence
416
+ """
417
+ # reshape:
418
+ z_mu = z_mu.reshape(-1, self.latent_dim, 1)
419
+ z_log_sd = z_log_sd.reshape(-1, self.latent_dim, 1)
420
+ # define the z probabilities (in this case Normal for both)
421
+ # p(z): N(z|0,I)
422
+ pz = torch.distributions.Normal(loc=torch.zeros_like(z_mu), scale=torch.ones_like(z_log_sd))
423
+ # q(z|x,phi): N(z|mu, z_var)
424
+ qz_x = torch.distributions.Normal(loc=z_mu, scale=torch.exp(z_log_sd))
425
+
426
+ # repameterization trick: z = z_mu + xi (*) z_log_var, xi~N(xi|0,I)
427
+ z = qz_x.rsample()
428
+ # Monte Carlo KL divergence: MCKL(p(z)||q(z|x,phi)) = log(p(z)) - log(q(z|x,phi))
429
+ # sum over weight dim, leaves the batch dim
430
+ kl_divergence = (pz.log_prob(z) - qz_x.log_prob(z)).sum(dim=1)
431
+ kl_loss = -kl_divergence.mean()
432
+
433
+ return z, kl_loss
434
+
435
+ def forward(self, x_s, x_i, x_a):
436
+ """
437
+ Forward pass `input_img` through the network
438
+ """
439
+ # reconstruction:
440
+ # encode:
441
+ # input reshape:
442
+ x_s = x_s.reshape(-1, 1, POINTS)
443
+ x_i = x_i.reshape(-1, 1, POINTS)
444
+ x_a = x_a.reshape(-1, 1, POINTS)
445
+ # concatenate along channel axis
446
+ x = torch.cat([x_s, x_i, x_a], dim=1)
447
+
448
+ # encode:
449
+ z_mu, z_log_sd = self._encoder(x)
450
+
451
+ # get the latent vector through reparameterization:
452
+ z, kl_loss = self.vae_reparameterize(z_mu, z_log_sd)
453
+
454
+ # decode:
455
+ # reshape:
456
+ z = z.reshape(-1, 1, 270)
457
+ x_d = self._decoder_z_mu(z)
458
+ semantic_channels = self._decoder(x_d)
459
+
460
+ # semantic grid: 10 channels
461
+ semantic_scan = self.softmax(semantic_channels)
462
+
463
+ return semantic_scan, semantic_channels, kl_loss
464
+
465
+ #
466
+ # end of class
467
+
468
+ #
469
+ # end of file
scripts/train.py ADDED
@@ -0,0 +1,380 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ #
3
+ # file: $ISIP_EXP/SOGMP/scripts/train.py
4
+ #
5
+ # revision history: xzt
6
+ # 20220824 (TE): first version
7
+ #
8
+ # usage:
9
+ # python train.py mdir train_data val_data
10
+ #
11
+ # arguments:
12
+ # mdir: the directory where the output model is stored
13
+ # train_data: the directory of training data
14
+ # val_data: the directory of valiation data
15
+ #
16
+ # This script trains a S3-Net model
17
+ #------------------------------------------------------------------------------
18
+
19
+ # import pytorch modules
20
+ #
21
+ import torch
22
+ import torch.nn as nn
23
+ from torch.optim import Adam
24
+ from tqdm import tqdm
25
+ import torch.nn.functional as F
26
+
27
+ # visualize:
28
+ from tensorboardX import SummaryWriter
29
+ import numpy as np
30
+
31
+ # import the model and all of its variables/functions
32
+ #
33
+ from model import *
34
+ import lovasz_losses as L
35
+
36
+ # import modules
37
+ #
38
+ import sys
39
+ import os
40
+
41
+
42
+ #-----------------------------------------------------------------------------
43
+ #
44
+ # global variables are listed here
45
+ #
46
+ #-----------------------------------------------------------------------------
47
+
48
+ # general global values
49
+ #
50
+ model_dir = './model/s3_net_model.pth' # the path of model storage
51
+ NUM_ARGS = 3
52
+ NUM_EPOCHS = 20000
53
+ BATCH_SIZE = 1024
54
+ LEARNING_RATE = "lr"
55
+ BETAS = "betas"
56
+ EPS = "eps"
57
+ WEIGHT_DECAY = "weight_decay"
58
+
59
+ # Constants
60
+ NUM_INPUT_CHANNELS = 3
61
+ NUM_OUTPUT_CHANNELS = 10 # 9 classes of semantic labels + 1 background
62
+ BETA = 0.01
63
+
64
+ # for reproducibility, we seed the rng
65
+ #
66
+ set_seed(SEED1)
67
+
68
+ # adjust_learning_rate
69
+ # 
70
+ def adjust_learning_rate(optimizer, epoch):
71
+ lr = 1e-4
72
+ if epoch > 50000:
73
+ lr = 2e-5
74
+ if epoch > 480000:
75
+ # lr = 5e-8
76
+ lr = lr * (0.1 ** (epoch // 110000))
77
+ # if epoch > 8300:
78
+ # lr = 1e-9
79
+ for param_group in optimizer.param_groups:
80
+ param_group['lr'] = lr
81
+
82
+
83
+ # train function:
84
+ def train(model, dataloader, dataset, device, optimizer, ce_criterion, lovasz_criterion, class_weights, epoch, epochs):
85
+ # set model to training mode:
86
+ model.train()
87
+ # for each batch in increments of batch size:
88
+ running_loss = 0.0
89
+ # kl_divergence:
90
+ kl_avg_loss = 0.0
91
+ # CE loss:
92
+ ce_avg_loss = 0.0
93
+
94
+ counter = 0
95
+ # get the number of batches (ceiling of train_data/batch_size):
96
+ num_batches = int(len(dataset)/dataloader.batch_size)
97
+ for i, batch in tqdm(enumerate(dataloader), total=num_batches):
98
+ #for i, batch in enumerate(dataloader, 0):
99
+ counter += 1
100
+ # collect the samples as a batch:
101
+ scans = batch['scan']
102
+ scans = scans.to(device)
103
+ intensities = batch['intensity']
104
+ intensities = intensities.to(device)
105
+ angle_incidence = batch['angle_incidence']
106
+ angle_incidence = angle_incidence.to(device)
107
+ labels = batch['label']
108
+ labels = labels.to(device)
109
+
110
+ batch_size = scans.size(0)
111
+
112
+ # set all gradients to 0:
113
+ optimizer.zero_grad()
114
+
115
+ # feed the batch to the network:
116
+ semantic_scan, semantic_channels, kl_loss = model(scans, intensities, angle_incidence)
117
+ # calculate the semantic ce loss:
118
+ ce_loss = ce_criterion(semantic_channels, labels.to(torch.long)).div(batch_size)
119
+ lovasz_loss, _ = lovasz_criterion(semantic_channels, labels.to(torch.long))
120
+ lovasz_loss = lovasz_loss.mul(class_weights.to("cuda")).sum()
121
+ # beta-vae:
122
+ loss = ce_loss + BETA*kl_loss + lovasz_loss
123
+ # perform back propagation:
124
+ loss.backward(torch.ones_like(loss))
125
+ optimizer.step()
126
+ # get the loss:
127
+ # multiple GPUs:
128
+ if torch.cuda.device_count() > 1:
129
+ loss = loss.mean()
130
+ ce_loss = ce_loss.mean()
131
+ kl_loss = lovasz_loss.mean() #kl_loss.mean()
132
+
133
+ running_loss += loss.item()
134
+ # kl_divergence:
135
+ kl_avg_loss += lovasz_loss.item() #kl_loss.item()
136
+ # CE loss:
137
+ ce_avg_loss += ce_loss.item()
138
+
139
+ # display informational message:
140
+ if(i % 512 == 0):
141
+ print('Epoch [{}/{}], Step[{}/{}], Loss: {:.4f}, CE_Loss: {:.4f}, Lovasz_Loss: {:.4f}'
142
+ .format(epoch, epochs, i + 1, num_batches, loss.item(), ce_loss.item(), lovasz_loss.item()))
143
+
144
+ train_loss = running_loss / counter
145
+ train_kl_loss = kl_avg_loss / counter
146
+ train_ce_loss = ce_avg_loss / counter
147
+
148
+ return train_loss, train_kl_loss, train_ce_loss
149
+
150
+ # validate function:
151
+ def validate(model, dataloader, dataset, device, ce_criterion, lovasz_criterion, class_weights):
152
+ # set model to evaluation mode:
153
+ model.eval()
154
+ # for each batch in increments of batch size:
155
+ running_loss = 0.0
156
+ # kl_divergence:
157
+ kl_avg_loss = 0.0
158
+ # CE loss:
159
+ ce_avg_loss = 0.0
160
+
161
+ counter = 0
162
+ # get the number of batches (ceiling of train_data/batch_size):
163
+ num_batches = int(len(dataset)/dataloader.batch_size)
164
+ with torch.no_grad():
165
+ for i, batch in tqdm(enumerate(dataloader), total=num_batches):
166
+ #for i, batch in enumerate(dataloader, 0):
167
+ counter += 1
168
+ # collect the samples as a batch:
169
+ scans = batch['scan']
170
+ scans = scans.to(device)
171
+ intensities = batch['intensity']
172
+ intensities = intensities.to(device)
173
+ angle_incidence = batch['angle_incidence']
174
+ angle_incidence = angle_incidence.to(device)
175
+ labels = batch['label']
176
+ labels = labels.to(device)
177
+
178
+ batch_size = scans.size(0)
179
+
180
+ # feed the batch to the network:
181
+ semantic_scan, semantic_channels, kl_loss = model(scans, intensities, angle_incidence)
182
+ # calculate the semantic ce loss:
183
+ ce_loss = ce_criterion(semantic_channels, labels.to(torch.long)).div(batch_size)
184
+ lovasz_loss, _ = lovasz_criterion(semantic_channels, labels.to(torch.long))
185
+ lovasz_loss = lovasz_loss.mul(class_weights.to("cuda")).sum()
186
+ # beta-vae:
187
+ loss = ce_loss + BETA*kl_loss + lovasz_loss
188
+ # multiple GPUs:
189
+ if torch.cuda.device_count() > 1:
190
+ loss = loss.mean()
191
+ ce_loss = ce_loss.mean()
192
+ kl_loss = lovasz_loss.mean() #kl_loss.mean()
193
+
194
+ running_loss += loss.item()
195
+ # kl_divergence:
196
+ kl_avg_loss += lovasz_loss.item() #kl_loss.item()
197
+ # CE loss:
198
+ ce_avg_loss += ce_loss.item()
199
+
200
+ val_loss = running_loss / counter
201
+ val_kl_loss = kl_avg_loss / counter
202
+ val_ce_loss = ce_avg_loss / counter
203
+
204
+ return val_loss, val_kl_loss, val_ce_loss
205
+
206
+ #------------------------------------------------------------------------------
207
+ #
208
+ # the main program starts here
209
+ #
210
+ #------------------------------------------------------------------------------
211
+
212
+ # function: main
213
+ #
214
+ # arguments: none
215
+ #
216
+ # return: none
217
+ #
218
+ # This method is the main function.
219
+ #
220
+ def main(argv):
221
+ # ensure we have the correct amount of arguments:
222
+ #global cur_batch_win
223
+ if(len(argv) != NUM_ARGS):
224
+ print("usage: python train.py [MDL_PATH] [TRAIN_PATH] [DEV_PATH] [TRAIN_MASK_PATH] [DEV_MASK_PATH]")
225
+ exit(-1)
226
+
227
+ # define local variables:
228
+ mdl_path = argv[0]
229
+ pTrain = argv[1]
230
+ pDev = argv[2]
231
+
232
+ # get the output directory name:
233
+ odir = os.path.dirname(mdl_path)
234
+
235
+ # if the odir doesn't exits, we make it:
236
+ if not os.path.exists(odir):
237
+ os.makedirs(odir)
238
+
239
+ # set the device to use GPU if available:
240
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
241
+
242
+ print('...Start reading data...')
243
+ ### training data ###
244
+ # training set and training data loader
245
+ train_dataset = VaeTestDataset(pTrain, 'train')
246
+ train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, num_workers=4, \
247
+ shuffle=True, drop_last=True, pin_memory=True)
248
+
249
+ ### validation data ###
250
+ # validation set and validation data loader
251
+ dev_dataset = VaeTestDataset(pDev, 'dev')
252
+ dev_dataloader = torch.utils.data.DataLoader(dev_dataset, batch_size=BATCH_SIZE, num_workers=2, \
253
+ shuffle=True, drop_last=True, pin_memory=True)
254
+
255
+ # calculate the class weights:
256
+ class_weights = np.array([2.514399, 1.4917144, 0.51608694, 0.659483, 1.0900991, 1.6461798, 0.32852992, 1.5633508, 0.9236576, 0.10251398]) # median frequency balance
257
+
258
+ #class_weights = np.array([1.4222778, 2.1834621, 40.17538]) # inverse log class_probability
259
+ class_weights = torch.Tensor(class_weights)
260
+ print("class weights: ", class_weights)
261
+ class_weights.to(device)
262
+ print('...Finish reading data...')
263
+
264
+ # instantiate a model:
265
+ model = S3Net(input_channels=NUM_INPUT_CHANNELS,
266
+ output_channels=NUM_OUTPUT_CHANNELS)
267
+ # moves the model to device (cpu in our case so no change):
268
+ model.to(device)
269
+
270
+ # set the adam optimizer parameters:
271
+ opt_params = { LEARNING_RATE: 0.001,
272
+ BETAS: (.9,0.999),
273
+ EPS: 1e-08,
274
+ WEIGHT_DECAY: .001 }
275
+ # set the loss criterion and optimizer:
276
+ ce_criterion = nn.CrossEntropyLoss(reduction='sum', weight=class_weights)
277
+ ce_criterion.to(device)
278
+ lovasz_criterion = L.LovaszSoftmax(reduction='sum', ignore_index=0)
279
+ lovasz_criterion.to(device)
280
+ # create an optimizer, and pass the model params to it:
281
+ optimizer = Adam(model.parameters(), **opt_params)
282
+
283
+ # get the number of epochs to train on:
284
+ epochs = NUM_EPOCHS
285
+
286
+ # if there are trained models, continue training:
287
+ if os.path.exists(mdl_path):
288
+ checkpoint = torch.load(mdl_path)
289
+ model.load_state_dict(checkpoint['model'])
290
+ optimizer.load_state_dict(checkpoint['optimizer'])
291
+ start_epoch = checkpoint['epoch']
292
+ print('Load epoch {} success'.format(start_epoch))
293
+ else:
294
+ start_epoch = 0
295
+ #pre_path = "./model/model_segnet_weight.pth"
296
+ #pretrained_model = torch.load(pre_path)
297
+ #model.load_state_dict(pretrained_model['model'])
298
+ print('No trained models, restart training')
299
+
300
+ # multiple GPUs:
301
+ if torch.cuda.device_count() > 1:
302
+ print("Let's use 2 of total", torch.cuda.device_count(), "GPUs!")
303
+ # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
304
+ model = nn.DataParallel(model) #, device_ids=[0, 1])
305
+ # moves the model to device (cpu in our case so no change):
306
+ model.to(device)
307
+
308
+ # tensorboard writer:
309
+ writer = SummaryWriter('runs')
310
+
311
+ epoch_num = 0
312
+ for epoch in range(start_epoch+1, epochs):
313
+ # adjust learning rate:
314
+ adjust_learning_rate(optimizer, epoch)
315
+ ################################## Train #####################################
316
+ # for each batch in increments of batch size
317
+ #
318
+ train_epoch_loss, train_kl_epoch_loss, train_ce_epoch_loss = train(
319
+ model, train_dataloader, train_dataset, device, optimizer, ce_criterion, lovasz_criterion, class_weights, epoch, epochs
320
+ )
321
+ valid_epoch_loss, valid_kl_epoch_loss, valid_ce_epoch_loss = validate(
322
+ model, dev_dataloader, dev_dataset, device, ce_criterion, lovasz_criterion, class_weights
323
+ )
324
+
325
+ # log the epoch loss
326
+ writer.add_scalar('training loss',
327
+ train_epoch_loss,
328
+ epoch)
329
+ writer.add_scalar('training kl loss',
330
+ train_kl_epoch_loss,
331
+ epoch)
332
+ writer.add_scalar('training ce loss',
333
+ train_ce_epoch_loss,
334
+ epoch)
335
+
336
+ writer.add_scalar('validation loss',
337
+ valid_epoch_loss,
338
+ epoch)
339
+ writer.add_scalar('validation kl loss',
340
+ valid_kl_epoch_loss,
341
+ epoch)
342
+ writer.add_scalar('validation ce loss',
343
+ valid_ce_epoch_loss,
344
+ epoch)
345
+
346
+ print('Train set: Average loss: {:.4f}'.format(train_epoch_loss))
347
+ print('Validation set: Average loss: {:.4f}'.format(valid_epoch_loss))
348
+
349
+ # save the model:
350
+ if(epoch % 2000 == 0):
351
+ if torch.cuda.device_count() > 1: # multiple GPUS:
352
+ state = {'model':model.module.state_dict(), 'optimizer':optimizer.state_dict(), 'epoch':epoch}
353
+ else:
354
+ state = {'model':model.state_dict(), 'optimizer':optimizer.state_dict(), 'epoch':epoch}
355
+ path='./model/model' + str(epoch) +'.pth'
356
+ torch.save(state, path)
357
+
358
+ epoch_num = epoch
359
+
360
+ # save the final model
361
+ if torch.cuda.device_count() > 1: # multiple GPUS:
362
+ state = {'model':model.module.state_dict(), 'optimizer':optimizer.state_dict(), 'epoch':epoch_num}
363
+ else:
364
+ state = {'model':model.state_dict(), 'optimizer':optimizer.state_dict(), 'epoch':epoch_num}
365
+ torch.save(state, mdl_path)
366
+
367
+ # exit gracefully
368
+ #
369
+
370
+ return True
371
+ #
372
+ # end of function
373
+
374
+
375
+ # begin gracefully
376
+ #
377
+ if __name__ == '__main__':
378
+ main(sys.argv[1:])
379
+ #
380
+ # end of file