zzuxzt commited on 23 days ago

Commit

d9c5371

verified ·

1 Parent(s): d3caa51

Upload folder using huggingface_hub

Browse files

Files changed (18) hide show

.gitattributes +3 -0
LICENSE +21 -0
README.md +239 -3
demo/1.lobby_s3net_segmentation.gif +3 -0
demo/2.lobby_semantic_mapping.gif +3 -0
demo/3.lobby_semantic_navigation.gif +3 -0
model/s3_net_model.pth +3 -0
output/semantic_ground_truth_7000.png +0 -0
output/semantic_s3net_7000.png +0 -0
run_eval_demo.sh +54 -0
run_train.sh +59 -0
scripts/__pycache__/convlstm.cpython-37.pyc +0 -0
scripts/__pycache__/lovasz_losses.cpython-37.pyc +0 -0
scripts/__pycache__/model.cpython-37.pyc +0 -0
scripts/decode_demo.py +244 -0
scripts/lovasz_losses.py +77 -0
scripts/model.py +469 -0
scripts/train.py +380 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+demo/1.lobby_s3net_segmentation.gif filter=lfs diff=lfs merge=lfs -text
+demo/2.lobby_semantic_mapping.gif filter=lfs diff=lfs merge=lfs -text
+demo/3.lobby_semantic_navigation.gif filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 Temple Robotics and Artificial Intelligence Lab
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,3 +1,239 @@
----
-license: mit
----

+# Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone
+ S³-Net implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899).
+Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY).
+The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.
+## Related Resources
+- **Dataset Download:** https://doi.org/10.5281/zenodo.18350696
+- **SALSA (Dataset and Labeling Framework):** https://github.com/TempleRAIL/semantic2d
+- **S³-Net (Stochastic Semantic Segmentation):** https://github.com/TempleRAIL/s3_net
+- **Semantic CNN Navigation:** https://github.com/TempleRAIL/semantic_cnn_nav
+## S³-Net: Stochastic Semantic Segmentation Network
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+S³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point.
+## Demo Results
+**S³-Net Segmentation**
+![S³-Net Segmentation](./demo/1.lobby_s3net_segmentation.gif)
+**Semantic Mapping**
+![Semantic Mapping](./demo/2.lobby_semantic_mapping.gif)
+**Semantic Navigation**
+![Semantic Navigation](./demo/3.lobby_semantic_navigation.gif)
+## Model Architecture
+S³-Net uses an encoder-decoder architecture with stochastic latent representations:
+```
+Input (3 channels: scan, intensity, angle of incidence)
+    │
+    ▼
+┌─────────────────────────────────────┐
+│  Encoder (Conv1D + Residual Blocks) │
+│  - Conv1D (3 → 32) stride=2         │
+│  - Conv1D (32 → 64) stride=2        │
+│  - Residual Stack (2 layers)        │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│  VAE Reparameterization             │
+│  - μ (mean) and σ (std) estimation  │
+│  - Latent sampling z ~ N(μ, σ²)     │
+│  - Monte Carlo KL divergence        │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│  Decoder (Residual + TransposeConv) │
+│  - Residual Stack (2 layers)        │
+│  - TransposeConv1D (64 → 32)        │
+│  - TransposeConv1D (32 → 10)        │
+│  - Softmax (10 semantic classes)    │
+└─────────────────────────────────────┘
+    │
+    ▼
+Output (10 channels: semantic probabilities)
+```
+**Key Features:**
+- **3 Input Channels:** Range scan, intensity, angle of incidence
+- **10 Output Classes:** Background + 9 semantic classes
+- **Stochastic Inference:** Multiple forward passes enable uncertainty estimation via majority voting
+- **Loss Function:** Cross-Entropy + Lovasz-Softmax + β-VAE KL divergence
+## Semantic Classes
+| ID | Class      | Description                    |
+|----|------------|--------------------------------|
+| 0  | Other      | Background/unknown             |
+| 1  | Chair      | Office and lounge chairs       |
+| 2  | Door       | Doors (open/closed)            |
+| 3  | Elevator   | Elevator doors                 |
+| 4  | Person     | Dynamic pedestrians            |
+| 5  | Pillar     | Structural pillars/columns     |
+| 6  | Sofa       | Sofas and couches              |
+| 7  | Table      | Tables of all types            |
+| 8  | Trash bin  | Waste receptacles              |
+| 9  | Wall       | Walls and flat surfaces        |
+## Requirements
+- Python 3.7+
+- PyTorch 1.7.1+
+- TensorBoard
+- NumPy
+- Matplotlib
+- tqdm
+Install dependencies:
+```bash
+pip install torch torchvision tensorboardX numpy matplotlib tqdm
+```
+## Dataset Structure
+S³-Net expects the Semantic2D dataset organized as follows:
+```
+~/semantic2d_data/
+├── dataset.txt                # List of dataset folders
+├── 2024-04-11-15-24-29/       # Dataset folder 1
+│   ├── train.txt              # Training sample list
+│   ├── dev.txt                # Validation sample list
+│   ├── scans_lidar/           # Range scans (.npy)
+│   ├── intensities_lidar/     # Intensity data (.npy)
+│   └── semantic_label/        # Ground truth labels (.npy)
+├── 2024-04-04-12-16-41/       # Dataset folder 2
+│   └── ...
+└── ...
+```
+**dataset.txt format:**
+```
+2024-04-11-15-24-29
+2024-04-04-12-16-41
+```
+## Usage
+### Training
+Train S³-Net on your dataset:
+```bash
+sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
+```
+**Arguments:**
+- `$1` - Training data directory (contains `dataset.txt` and subfolders)
+- `$2` - Validation data directory
+**Training Configuration** (in `scripts/train.py`):
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `NUM_EPOCHS` | 20000 | Total training epochs |
+| `BATCH_SIZE` | 1024 | Samples per batch |
+| `LEARNING_RATE` | 0.001 | Initial learning rate |
+| `BETA` | 0.01 | β-VAE weight for KL divergence |
+**Learning Rate Schedule:**
+- Epochs 0-50000: `1e-4`
+- Epochs 50000-480000: `2e-5`
+- Epochs 480000+: Exponential decay
+The model saves checkpoints every 2000 epochs to `./model/`.
+### Inference Demo
+Run semantic segmentation on test data:
+```bash
+sh run_eval_demo.sh ~/semantic2d_data/
+```
+**Arguments:**
+- `$1` - Test data directory (reads `dev.txt` for sample list)
+**Output:**
+- `./output/semantic_ground_truth_*.png` - Ground truth visualizations
+- `./output/semantic_s3net_*.png` - S³-Net predictions
+**Example Output:**
+| Ground Truth | S³-Net Prediction |
+|:------------:|:-----------------:|
+| ![Ground Truth](./output/semantic_ground_truth_7000.png) | ![S³-Net Prediction](./output/semantic_s3net_7000.png) |
+### Stochastic Inference
+S³-Net performs **32 stochastic forward passes** per sample and uses **majority voting** to determine the final prediction. This provides:
+- More robust predictions
+- Implicit uncertainty estimation
+- Reduced noise in segmentation boundaries
+## File Structure
+```
+s3_net/
+├── demo/                           # Demo GIFs
+│   ├── 1.lobby_s3net_segmentation.gif
+│   ├── 2.lobby_semantic_mapping.gif
+│   └── 3.lobby_semantic_navigation.gif
+├── model/
+│   └── s3_net_model.pth            # Pretrained model weights
+├── output/                         # Inference output directory
+├── scripts/
+│   ├── model.py                    # S³-Net model architecture
+│   ├── train.py                    # Training script
+│   ├── decode_demo.py              # Inference/demo script
+│   └── lovasz_losses.py            # Lovasz-Softmax loss function
+├── run_train.sh                    # Training driver script
+├── run_eval_demo.sh                # Inference driver script
+├── LICENSE                         # MIT License
+└── README.md                       # This file
+```
+## TensorBoard Monitoring
+Training logs are saved to `./runs/`. View training progress:
+```bash
+tensorboard --logdir=runs
+```
+Monitored metrics:
+- Training/Validation loss
+- Cross-Entropy loss
+- Lovasz-Softmax loss
+## Pre-trained Model
+A pre-trained model is included at `model/s3_net_model.pth`. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor.
+To use the pre-trained model:
+```bash
+sh run_eval_demo.sh ~/semantic2d_data/
+```
+## Citation
+```bibtex
+@article{xie2026semantic2d,
+  title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
+  author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
+  journal={arXiv preprint arXiv:2409.09899},
+  year={2026}
+}
+```

demo/1.lobby_s3net_segmentation.gif ADDED Viewed

Git LFS Details

SHA256: b9d77f9bb9a88f57be8888e934c69cfc0a5b79edc14dd69ddee152c5a6ecc3fc
Pointer size: 133 Bytes
Size of remote file: 12.4 MB

demo/2.lobby_semantic_mapping.gif ADDED Viewed

Git LFS Details

SHA256: cf0af6410a4c25971390639ab0bd3466de629af0c2be99abccaa2d353fe12251
Pointer size: 132 Bytes
Size of remote file: 1.45 MB

demo/3.lobby_semantic_navigation.gif ADDED Viewed

Git LFS Details

SHA256: e26fb5c2fb52941f4e33c7b4b8e7f116d3e3689d2f9c40d7fe288972b5af48f5
Pointer size: 133 Bytes
Size of remote file: 13.2 MB

model/s3_net_model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:86ffcba0092e8e20d80fc02e5e01bb675c60d0c897d8830305ecc5b8b20b6dbb
+size 741507

output/semantic_ground_truth_7000.png ADDED Viewed

output/semantic_s3net_7000.png ADDED Viewed

run_eval_demo.sh ADDED Viewed

	@@ -0,0 +1,54 @@

+#!/bin/sh
+#
+# file: run_demo.sh
+#
+# This is a simple driver script that runs training and then decoding
+# on the training set and the val set.
+#
+# To run this script, execute the following line:
+#
+#  run_demo.sh train.dat val.dat
+#
+# The first argument ($1) is the training data. The last two arguments,
+# test data ($2) and evaluation data ($3) are optional.
+#
+# An example of how to run this is as follows:
+#
+# xzt: echo $PWD
+# /home/xzt/SOGMP
+# xzt: sh run_demo.sh ~/semantic2d_data/
+#
+# decode the number of command line arguments
+#
+NARGS=$#
+if (test "$NARGS" -eq "0") then
+    echo "usage: run_demo.sh test.dat"
+    exit 1
+fi
+# define a base directory for the experiment
+#
+DL_EXP=`pwd`;
+DL_SCRIPTS="$DL_EXP/scripts";
+DL_OUT="$DL_EXP/output";
+# define the output directories for training/decoding/scoring
+#
+#DL_TRAIN_ODIR="$DL_OUT/00_train";
+DL_TRAIN_ODIR="$DL_EXP/model";
+DL_MDL_PATH="$DL_TRAIN_ODIR/s3_net_model.pth";
+# evaluate each data set that was specified
+#
+echo "... starting evaluation of $1..."
+$DL_SCRIPTS/decode_demo.py $DL_MDL_PATH $1 | \
+    tee $DL_OUT/01_decode_dev.log | grep "00 out of\|Average"
+echo "... finished evaluation of $1 ..."
+echo "======= end of results ======="
+#
+# exit gracefully

run_train.sh ADDED Viewed

	@@ -0,0 +1,59 @@

+#!/bin/sh
+#
+# file: run_train.sh
+#
+# This is a simple driver script that runs training and then decoding
+# on the training set and the val set.
+#
+# To run this script, execute the following line:
+#
+#  run_train.sh train.dat val.dat
+#
+# The first argument ($1) is the training data. The last two arguments,
+# test data ($2) and evaluation data ($3) are optional.
+#
+# An example of how to run this is as follows:
+#
+# xzt: echo $PWD
+# /home/xzt/SOGMP
+# xzt: sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
+#
+# decode the number of command line arguments
+#
+NARGS=$#
+if (test "$NARGS" -eq "0") then
+    echo "usage: run_train.sh train.dat val.dat"
+    exit 1
+fi
+# define a base directory for the experiment
+#
+DL_EXP=`pwd`;
+DL_SCRIPTS="$DL_EXP/scripts";
+DL_OUT="$DL_EXP/output";
+# define the number of feats environment variable
+#
+export DL_NUM_FEATS=3
+# define the output directories for training/decoding/scoring
+#
+#DL_TRAIN_ODIR="$DL_OUT/00_train";
+DL_TRAIN_ODIR="$DL_EXP/model";
+DL_MDL_PATH="$DL_TRAIN_ODIR/model.pth";
+# create the output directory
+#
+rm -fr $DL_OUT
+mkdir -p $DL_OUT
+# execute training: training must always be run
+#
+echo "... starting training on $1 ..."
+$DL_SCRIPTS/train.py $DL_MDL_PATH $1 $2 | tee $DL_OUT/00_train.log | \
+      grep "reading\|Step\|Average\|Warning\|Error"
+echo "... finished training on $1 ..."
+#

scripts/__pycache__/convlstm.cpython-37.pyc ADDED Viewed

Binary file (5.75 kB). View file

scripts/__pycache__/lovasz_losses.cpython-37.pyc ADDED Viewed

Binary file (2.32 kB). View file

scripts/__pycache__/model.cpython-37.pyc ADDED Viewed

Binary file (9.34 kB). View file

scripts/decode_demo.py ADDED Viewed

	@@ -0,0 +1,244 @@

+#!/usr/bin/env python
+#
+# file: $ISIP_EXP/tuh_dpath/exp_0074/scripts/decode.py
+#
+# revision history:
+#  20190925 (TE): first version
+#
+# usage:
+#  python decode.py odir mfile data
+#
+# arguments:
+#  odir: the directory where the hypotheses will be stored
+#  mfile: input model file
+#  data: the input data list to be decoded
+#
+# This script decodes data using a simple MLP model.
+#------------------------------------------------------------------------------
+# import pytorch modules
+#
+import torch
+import torch.nn as nn
+from tqdm import tqdm
+# visualize:
+import matplotlib.pyplot as plt
+import numpy as np
+import matplotlib
+matplotlib.style.use('ggplot')
+# import the model and all of its variables/functions
+#
+from model import *
+# import modules
+#
+import sys
+import os
+#-----------------------------------------------------------------------------
+#
+# global variables are listed here
+#
+#-----------------------------------------------------------------------------
+# general global values
+#
+NUM_ARGS = 3
+SPACE = " "
+# Constants
+POINTS = 1081
+NUM_CLASSES = 9
+NUM_INPUT_CHANNELS = 1
+NUM_OUTPUT_CHANNELS = NUM_CLASSES
+# Hokuyo UTM-30LX-EW:
+POINTS = 1081 # the number of lidar points
+AGNLE_MIN = -2.356194496154785
+AGNLE_MAX = 2.356194496154785
+RANGE_MAX = 60.0
+# for reproducibility, we seed the rng
+#
+set_seed(SEED1)
+#------------------------------------------------------------------------------
+#
+# the main program starts here
+#
+#------------------------------------------------------------------------------
+# function: main
+#
+# arguments: none
+#
+# return: none
+#
+# This method is the main function.
+#
+def main(argv):
+    # ensure we have the correct number of arguments:
+    if(len(argv) != NUM_ARGS):
+        print("usage: python nedc_decode_mdl.py [ODIR] [MDL_PATH] [EVAL_SET]")
+        exit(-1)
+    # define local variables:
+    odir = argv[0]
+    mdl_path = argv[1]
+    fImg = argv[2]
+    # if the odir doesn't exist, we make it:
+    if not os.path.exists(odir):
+        os.makedirs(odir)
+    # set the device to use GPU if available:
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    # get array of the data
+    # data: [[0, 1, ... 26], [27, 28, ...] ...]
+    # labels: [0, 0, 1, ...]
+    #
+    #[ped_pos_e, scan_e, goal_e, vel_e] = get_data(fname)
+    eval_dataset = VaeTestDataset(fImg,'dev')
+    eval_dataloader = torch.utils.data.DataLoader(eval_dataset, batch_size=1, \
+                                                   shuffle=False, drop_last=True) #, pin_memory=True)
+    # instantiate a model:
+    model = S3Net(input_channels=NUM_INPUT_CHANNELS,
+                 output_channels=NUM_OUTPUT_CHANNELS)
+    # moves the model to device (cpu in our case so no change):
+    model.to(device)
+    # set the model to evaluate
+    #
+    model.eval()
+    # set the loss criterion:
+    criterion = nn.MSELoss(reduction='sum') #, weight=class_weights)
+    criterion.to(device)
+    # load the weights
+    #
+    checkpoint = torch.load(mdl_path, map_location=device)
+    model.load_state_dict(checkpoint['model'])
+    # for each batch in increments of batch size:
+    counter = 0
+    num_samples = 32
+    # get the number of batches (ceiling of train_data/batch_size):
+    num_batches = int(len(eval_dataset)/eval_dataloader.batch_size)
+    with torch.no_grad():
+        for i, batch in tqdm(enumerate(eval_dataloader), total=num_batches):
+        #for i, batch in enumerate(dataloader, 0):
+            if(i % 100 == 0):
+                counter += 1
+                # collect the samples as a batch:
+                scans = batch['scan']
+                scans = scans.to(device)
+                intensities = batch['intensity']
+                intensities = intensities.to(device)
+                angle_incidence = batch['angle_incidence']
+                angle_incidence = angle_incidence.to(device)
+                labels = batch['label']
+                labels = labels.to(device)
+                # feed the batch to the network:
+                inputs_samples = scans.repeat(num_samples,1,1)
+                intensity_samples = intensities.repeat(num_samples,1,1)
+                angle_incidence_samples = angle_incidence.repeat(num_samples,1,1)
+                # feed the batch to the network:
+                semantic_scan, semantic_channels, kl_loss = model(inputs_samples, intensity_samples, angle_incidence_samples)
+                semantic_scans = semantic_scan.cpu().detach().numpy()
+                semantic_scans_mx = semantic_scans.argmax(axis=1)
+                # majority vote:
+                semantic_scans_mx_mean = semantic_scans_mx.mode(0).values
+                # plot:
+                r = scans.cpu().detach().numpy().reshape(POINTS)
+                theta = np.linspace(AGNLE_MIN, AGNLE_MAX, num=POINTS, endpoint='true')
+                ## plot semantic label:
+                fig = plt.figure(figsize=(12, 12))
+                ax = fig.add_subplot(1,1,1, projection='polar', facecolor='seashell')
+                smap = labels.reshape(POINTS)
+                # add the background label:
+                theta =  np.insert(theta, -1, np.pi)
+                r = np.insert(r, -1, 1)
+                smap = np.insert(smap, -1, 0)
+                label_val = np.unique(smap).astype(int)
+                colors = smap
+                area = 6
+                scatter = ax.scatter(theta, r, c=colors, s=area, cmap='nipy_spectral', alpha=0.95, linewidth=10)
+                ax.set_xticks(np.linspace(AGNLE_MIN, AGNLE_MAX, 8, endpoint='true'))
+                ax.set_thetamin(-135)
+                ax.set_thetamax(135)
+                ax.set_yticklabels([])
+                # produce a legend with the unique colors from the scatter
+                classes = ['Other', 'Chair', 'Door', 'Elevator', 'Person', 'Pillar', 'Sofa', 'Table', 'Trash bin', 'Wall']
+                plt.xticks(fontsize=16)
+                plt.yticks(fontsize=16)
+                plt.legend(handles=scatter.legend_elements(num=[j for j in label_val])[0], labels=[classes[j] for j in label_val], bbox_to_anchor=(0.5, -0.08), loc='lower center', fontsize=18)
+                ax.grid(False)
+                ax.set_theta_offset(np.pi/2)
+                input_img_name = "./output/semantic_ground_truth_" + str(i)+ ".jpg"
+                plt.savefig(input_img_name, bbox_inches='tight')
+                #plt.show()
+                ## plot s3-net semantic seg,ementation:
+                fig = plt.figure(figsize=(12, 12))
+                ax = fig.add_subplot(1,1,1, projection='polar', facecolor='seashell')
+                smap = semantic_scans_mx_mean.reshape(POINTS)
+                # add the background label:
+                theta =  np.insert(theta, -1, np.pi)
+                r = np.insert(r, -1, 1)
+                smap = np.insert(smap, -1, 0)
+                label_val = np.unique(smap).astype(int)
+                colors = smap
+                area = 6
+                scatter = ax.scatter(theta, r, c=colors, s=area, cmap='nipy_spectral', alpha=0.95, linewidth=10)
+                ax.set_xticks(np.linspace(AGNLE_MIN, AGNLE_MAX, 8, endpoint='true'))
+                ax.set_thetamin(-135)
+                ax.set_thetamax(135)
+                ax.set_yticklabels([])
+                # produce a legend with the unique colors from the scatter
+                classes = ['Other', 'Chair', 'Door', 'Elevator', 'Person', 'Pillar', 'Sofa', 'Table', 'Trash bin', 'Wall']
+                plt.xticks(fontsize=16)
+                plt.yticks(fontsize=16)
+                plt.legend(handles=scatter.legend_elements(num=[j for j in label_val])[0], labels=[classes[j] for j in label_val], bbox_to_anchor=(0.5, -0.08), loc='lower center', fontsize=18)
+                ax.grid(False)
+                ax.set_theta_offset(np.pi/2)
+                input_img_name = "./output/semantic_s3net_" + str(i)+ ".jpg"
+                plt.savefig(input_img_name, bbox_inches='tight')
+                print(i)
+    # exit gracefully
+    #
+    return True
+#
+# end of function
+# begin gracefully
+#
+if __name__ == '__main__':
+    main(sys.argv[1:])
+#
+# end of file

scripts/lovasz_losses.py ADDED Viewed

	@@ -0,0 +1,77 @@

+#!/usr/bin/python
+# -*- encoding: utf-8 -*-
+#!/usr/bin/env python
+#
+# file: $ISIP_EXP/SOGMP/scripts/model.py
+#
+# revision history: xzt
+#  20220824 (TE): first version
+#
+# usage:
+#
+# This script hold the loss fucntions for the Lovasz-Softmax loss.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.cuda.amp as amp
+#  grads = {}
+##
+# version 1: use torch.autograd
+class LovaszSoftmax(nn.Module):
+    '''
+    This is the autograd version, used in the multi-category classification case
+    '''
+    def __init__(self, reduction='mean', ignore_index=-100):
+        super(LovaszSoftmax, self).__init__()
+        self.reduction = reduction
+        self.lb_ignore = ignore_index
+    def forward(self, logits, label):
+        '''
+        Same usage method as nn.CrossEntropyLoss:
+            >>> criteria = LovaszSoftmax()
+            >>> logits = torch.randn(8, 19, 384, 384) # nchw, float/half
+            >>> lbs = torch.randint(0, 19, (8, 384, 384)) # nhw, int64_t
+            >>> loss = criteria(logits, lbs)
+        '''
+        # overcome ignored label
+        n, c, h = logits.size()
+        logits = logits.transpose(0, 1).reshape(c, -1).float() # use fp32 to avoid nan
+        label = label.view(-1)
+        idx = label.ne(self.lb_ignore).nonzero(as_tuple=False).squeeze()
+        probs = logits.softmax(dim=0)[:, idx]
+        label = label[idx]
+        lb_one_hot = torch.zeros_like(probs).scatter_(
+                0, label.unsqueeze(0), 1).detach()
+        errs = (lb_one_hot - probs).abs()
+        errs_sort, errs_order = torch.sort(errs, dim=1, descending=True)
+        n_samples = errs.size(1)
+        # lovasz extension grad
+        with torch.no_grad():
+            #  lb_one_hot_sort = lb_one_hot[
+            #      torch.arange(c).unsqueeze(1).repeat(1, n_samples), errs_order
+            #      ].detach()
+            lb_one_hot_sort = torch.cat([
+                lb_one_hot[i, ord].unsqueeze(0)
+                for i, ord in enumerate(errs_order)], dim=0)
+            n_pos = lb_one_hot_sort.sum(dim=1, keepdim=True)
+            inter = n_pos - lb_one_hot_sort.cumsum(dim=1)
+            union = n_pos + (1. - lb_one_hot_sort).cumsum(dim=1)
+            jacc = 1. - inter / union
+            if n_samples > 1:
+                jacc[:, 1:] = jacc[:, 1:] - jacc[:, :-1]
+        losses = torch.einsum('ab,ab->a', errs_sort, jacc)
+        if self.reduction == 'sum':
+            losses = losses.sum()
+        elif self.reduction == 'mean':
+            losses = losses.mean()
+        return losses, errs

scripts/model.py ADDED Viewed

	@@ -0,0 +1,469 @@

+#!/usr/bin/env python
+#
+# file: $ISIP_EXP/SOGMP/scripts/model.py
+#
+# revision history: xzt
+#  20220824 (TE): first version
+#
+# usage:
+#
+# This script hold the model architecture
+#------------------------------------------------------------------------------
+# import pytorch modules
+#
+from __future__ import print_function
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+from collections import OrderedDict
+# import modules
+#
+import os
+import random
+# for reproducibility, we seed the rng
+#
+SEED1 = 1337
+NEW_LINE = "\n"
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+#-----------------------------------------------------------------------------
+#
+# helper functions are listed here
+#
+#-----------------------------------------------------------------------------
+# function: set_seed
+#
+# arguments: seed - the seed for all the rng
+#
+# returns: none
+#
+# this method seeds all the random number generators and makes
+# the results deterministic
+#
+def set_seed(seed):
+    #torch.manual_seed(seed)
+    #torch.cuda.manual_seed_all(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+    #random.seed(seed)
+    #os.environ['PYTHONHASHSEED'] = str(seed)
+#
+# end of method
+# calculate the angle of incidence of the lidar ray:
+def angle_incidence_calculation(b, c, alpha, last_ray=False):
+    '''
+    # remove invalid values:
+    if(last_ray): # the last ray
+        if(np.isnan(b) or np.isinf(b)):
+            b = 60.
+        if(np.isnan(c) or np.isinf(c)):
+            c = 60.
+    else:
+        b[np.isnan(b)] = 60.
+        b[np.isinf(b)] = 60.
+        c[np.isnan(c)] = 60.
+        c[np.isinf(c)] = 60.
+    '''
+    # the law of cosines:
+    a = np.sqrt(b*b + c*c - 2*b*c*np.cos(alpha))
+    if(last_ray): # the last ray
+        beta = np.arccos([(a*a + c*c - b*b)/(2*a*c)])
+        theta = np.abs(np.pi/2 - beta)
+    else:
+        gamma = np.arccos([(a*a + b*b - c*c)/(2*a*b)])
+        theta = np.abs(np.pi/2 - gamma)
+    return theta
+# function: get_data
+#
+# arguments: fp - file pointer
+#            num_feats - the number of features in a sample
+#
+# returns: data - the signals/features
+#          labels - the correct labels for them
+#
+# this method takes in a fp and returns the data and labels
+POINTS = 1081
+class VaeTestDataset(torch.utils.data.Dataset):
+    def __init__(self, img_path, file_name):
+        # initialize the data and labels
+        # read the names of image data:
+        self.scan_file_names = []
+        self.intensity_file_names = []
+        #self.vel_file_names = []
+        self.label_file_names = []
+        # parameters: data mean std: scan, intensity, angle of incidence:
+        # [[4.518406, 8.2914915], [3081.8167, 1529.4413]]
+        # [4.518406, 8.2914915], [3081.8167, 1529.4413], [0.5959513, 0.4783924]]
+        self.s_mu = 4.518406
+        self.s_std = 8.2914915
+        self.i_mu = 3081.8167
+        self.i_std = 1529.4413
+        self.a_mu = 0.5959513
+        self.a_std = 0.4783924
+        # open train.txt or dev.txt:
+        fp_folder = open(img_path+'dataset.txt','r')
+        # for each line of the file:
+        for folder_line in fp_folder.read().split(NEW_LINE):
+            if('-' in folder_line):
+                folder_path = folder_line
+                fp_file = open(img_path+folder_path+'/'+file_name+'.txt', 'r')
+                for line in fp_file.read().split(NEW_LINE):
+                    if('.npy' in line):
+                        self.scan_file_names.append(img_path+folder_path+'/scans_lidar/'+line)
+                        self.intensity_file_names.append(img_path+folder_path+'/intensities_lidar/'+line)
+                        #self.vel_file_names.append(img_path+folder_path+'/velocities/'+line)
+                        self.label_file_names.append(img_path+folder_path+'/semantic_label/'+line)
+                # close txt file:
+                fp_file.close()
+        # close txt file:
+        fp_folder.close()
+        self.length = len(self.scan_file_names)
+        print("dataset length: ", self.length)
+    def __len__(self):
+        return self.length
+    def __getitem__(self, idx):
+        # get the index of start point:
+        scan = np.zeros((1, POINTS))
+        intensity = np.zeros((1, POINTS))
+        angle_incidence = np.zeros((1, POINTS))
+        label = np.zeros((1, POINTS))
+        # get the scan data:
+        intensity_name = self.intensity_file_names[idx]
+        intensity = np.load(intensity_name)
+        # get the scan data:
+        scan_name = self.scan_file_names[idx]
+        scan = np.load(scan_name)
+        # get the semantic label data:
+        label_name = self.label_file_names[idx]
+        label = np.load(label_name)
+        # get the angle of incidence of the ray:
+        b = scan[:-1]
+        c = scan[1:]
+        alpha = np.ones(POINTS - 1)*((270*np.pi / 180) / (POINTS - 1))
+        theta = angle_incidence_calculation(b, c, alpha)
+        # last ray:
+        b_last = scan[-2]
+        c_last = scan[-1]
+        alpha_last = (270*np.pi / 180) / (POINTS - 1)
+        theta_last = angle_incidence_calculation(b_last, c_last, alpha_last, last_ray=True)
+        angle_incidence = np.concatenate((theta[0], theta_last), axis=0)
+        # initialize:
+        scan[np.isnan(scan)] = 0.
+        scan[np.isinf(scan)] = 0.
+        intensity[np.isnan(intensity)] = 0.
+        intensity[np.isinf(intensity)] = 0.
+        angle_incidence[np.isnan(angle_incidence)] = 0.
+        angle_incidence[np.isinf(angle_incidence)] = 0.
+        label[np.isnan(label)] = 0.
+        label[np.isinf(label)] = 0.
+        # data normalization:
+        # standardization: scan
+        # mu: 4.518406, std: 8.2914915
+        scan = (scan - self.s_mu) / self.s_std
+        # standardization: intensity
+        # mu: 3081.8167, std: 1529.4413
+        intensity = (intensity - self.i_mu) / self.i_std
+        # standardization: angle_incidence
+        # mu: 0.5959513, std: 0.4783924
+        angle_incidence = (angle_incidence - self.a_mu) / self.a_std
+        # transfer to pytorch tensor:
+        scan_tensor = torch.FloatTensor(scan)
+        intensity_tensor = torch.FloatTensor(intensity)
+        angle_incidence_tensor = torch.FloatTensor(angle_incidence)
+        label_tensor =  torch.FloatTensor(label)
+        data = {
+                'scan': scan_tensor,
+                'intensity': intensity_tensor,
+                'angle_incidence': angle_incidence_tensor,
+                'label': label_tensor,
+                }
+        return data
+#
+# end of function
+#------------------------------------------------------------------------------
+#
+# the model is defined here
+#
+#------------------------------------------------------------------------------
+# define the PyTorch VAE model
+#
+# define a VAE
+# Residual blocks:
+class Residual(nn.Module):
+    def __init__(self, in_channels, num_hiddens, num_residual_hiddens):
+        super(Residual, self).__init__()
+        self._block = nn.Sequential(
+            nn.ReLU(True),
+            nn.Conv1d(in_channels=in_channels,
+                      out_channels=num_residual_hiddens,
+                      kernel_size=3, stride=1, padding=1, bias=False),
+            nn.BatchNorm1d(num_residual_hiddens),
+            nn.ReLU(True),
+            nn.Conv1d(in_channels=num_residual_hiddens,
+                      out_channels=num_hiddens,
+                      kernel_size=1, stride=1, bias=False),
+            nn.BatchNorm1d(num_hiddens)
+        )
+    def forward(self, x):
+        return x + self._block(x)
+class ResidualStack(nn.Module):
+    def __init__(self, in_channels, num_hiddens, num_residual_layers, num_residual_hiddens):
+        super(ResidualStack, self).__init__()
+        self._num_residual_layers = num_residual_layers
+        self._layers = nn.ModuleList([Residual(in_channels, num_hiddens, num_residual_hiddens)
+                             for _ in range(self._num_residual_layers)])
+    def forward(self, x):
+        for i in range(self._num_residual_layers):
+            x = self._layers[i](x)
+        return F.relu(x)
+# Encoder & Decoder Architecture:
+# Encoder:
+class Encoder(nn.Module):
+    def __init__(self, in_channels, num_hiddens, num_residual_layers, num_residual_hiddens):
+        super(Encoder, self).__init__()
+        self._conv_1 = nn.Sequential(*[
+                                        nn.Conv1d(in_channels=in_channels,
+                                                  out_channels=num_hiddens//2,
+                                                  kernel_size=4,
+                                                  stride=2,
+                                                  padding=1),
+                                        nn.BatchNorm1d(num_hiddens//2),
+                                        nn.ReLU(True)
+                                    ])
+        self._conv_2 = nn.Sequential(*[
+                                        nn.Conv1d(in_channels=num_hiddens//2,
+                                                  out_channels=num_hiddens,
+                                                  kernel_size=4,
+                                                  stride=2,
+                                                  padding=1),
+                                        nn.BatchNorm1d(num_hiddens)
+                                        #nn.ReLU(True)
+                                    ])
+        self._residual_stack = ResidualStack(in_channels=num_hiddens,
+                                             num_hiddens=num_hiddens,
+                                             num_residual_layers=num_residual_layers,
+                                             num_residual_hiddens=num_residual_hiddens)
+    def forward(self, inputs):
+        x = self._conv_1(inputs)
+        x = self._conv_2(x)
+        x = self._residual_stack(x)
+        return x
+# Decoder:
+class Decoder(nn.Module):
+    def __init__(self, out_channels, num_hiddens, num_residual_layers, num_residual_hiddens):
+        super(Decoder, self).__init__()
+        self._residual_stack = ResidualStack(in_channels=num_hiddens,
+                                             num_hiddens=num_hiddens,
+                                             num_residual_layers=num_residual_layers,
+                                             num_residual_hiddens=num_residual_hiddens)
+        self._conv_trans_2 = nn.Sequential(*[
+                                            nn.ReLU(True),
+                                            nn.ConvTranspose1d(in_channels=num_hiddens,
+                                                              out_channels=num_hiddens//2,
+                                                              kernel_size=4,
+                                                              stride=2,
+                                                              padding=1),
+                                            nn.BatchNorm1d(num_hiddens//2),
+                                            nn.ReLU(True)
+                                        ])
+        self._conv_trans_1 = nn.Sequential(*[
+                                            nn.ConvTranspose1d(in_channels=num_hiddens//2,
+                                                              out_channels=num_hiddens//2,
+                                                              kernel_size=4,
+                                                              stride=2,
+                                                              padding=1,
+                                                              output_padding=1),
+                                            nn.BatchNorm1d(num_hiddens//2),
+                                            nn.ReLU(True),
+                                            nn.Conv1d(in_channels=num_hiddens//2,
+                                                      out_channels=out_channels,
+                                                      kernel_size=3,
+                                                      stride=1,
+                                                      padding=1),
+                                            #nn.Sigmoid()
+                                        ])
+    def forward(self, inputs):
+        x = self._residual_stack(inputs)
+        x = self._conv_trans_2(x)
+        x = self._conv_trans_1(x)
+        return x
+class VAE_Encoder(nn.Module):
+    def __init__(self, input_channel, num_hiddens, num_residual_layers, num_residual_hiddens, embedding_dim):
+        super(VAE_Encoder, self).__init__()
+        # parameters:
+        self.input_channels = input_channel
+        '''
+        # Constants
+        num_hiddens = 128 #128
+        num_residual_hiddens = 64 #32
+        num_residual_layers = 2
+        embedding_dim = 2 #64
+        '''
+        # encoder:
+        in_channels = input_channel
+        self._encoder = Encoder(in_channels,
+                                num_hiddens,
+                                num_residual_layers,
+                                num_residual_hiddens)
+        # z latent variable:
+        self._encoder_z_mu = nn.Conv1d(in_channels=num_hiddens,
+                                    out_channels=embedding_dim,
+                                    kernel_size=1,
+                                    stride=1)
+        self._encoder_z_log_sd = nn.Conv1d(in_channels=num_hiddens,
+                                    out_channels=embedding_dim,
+                                    kernel_size=1,
+                                    stride=1)
+    def forward(self, x):
+        # input reshape:
+        x = x.reshape(-1, self.input_channels, POINTS)
+        # Encoder:
+        encoder_out = self._encoder(x)
+        # get `mu` and `log_var`:
+        z_mu = self._encoder_z_mu(encoder_out)
+        z_log_sd = self._encoder_z_log_sd(encoder_out)
+        return z_mu, z_log_sd
+# our proposed model:
+class S3Net(nn.Module):
+    def __init__(self, input_channels, output_channels):
+        super(S3Net, self).__init__()
+        # parameters:
+        self.input_channels = input_channels
+        self.latent_dim = 270
+        self.output_channels = output_channels
+        # Constants
+        num_hiddens = 64 #128
+        num_residual_hiddens = 32 #64
+        num_residual_layers = 2
+        embedding_dim = 1 #2
+        # prediction encoder:
+        self._encoder = VAE_Encoder(self.input_channels,
+                                    num_hiddens,
+                                    num_residual_layers,
+                                    num_residual_hiddens,
+                                    embedding_dim)
+        # decoder:
+        self._decoder_z_mu = nn.ConvTranspose1d(in_channels=embedding_dim,
+                                    out_channels=num_hiddens,
+                                    kernel_size=1,
+                                    stride=1)
+        self._decoder = Decoder(self.output_channels,
+                                num_hiddens,
+                                num_residual_layers,
+                                num_residual_hiddens)
+        self.softmax = nn.Softmax(dim=1)
+    def vae_reparameterize(self, z_mu, z_log_sd):
+        """
+        :param mu: mean from the encoder's latent space
+        :param log_sd: log standard deviation from the encoder's latent space
+        :output: reparameterized latent variable z, Monte carlo KL divergence
+        """
+        # reshape:
+        z_mu = z_mu.reshape(-1, self.latent_dim, 1)
+        z_log_sd = z_log_sd.reshape(-1, self.latent_dim, 1)
+        # define the z probabilities (in this case Normal for both)
+        # p(z): N(z|0,I)
+        pz = torch.distributions.Normal(loc=torch.zeros_like(z_mu), scale=torch.ones_like(z_log_sd))
+        # q(z|x,phi): N(z|mu, z_var)
+        qz_x = torch.distributions.Normal(loc=z_mu, scale=torch.exp(z_log_sd))
+        # repameterization trick: z = z_mu + xi (*) z_log_var, xi~N(xi|0,I)
+        z = qz_x.rsample()
+        # Monte Carlo KL divergence: MCKL(p(z)||q(z|x,phi)) = log(p(z)) - log(q(z|x,phi))
+        # sum over weight dim, leaves the batch dim
+        kl_divergence = (pz.log_prob(z) - qz_x.log_prob(z)).sum(dim=1)
+        kl_loss = -kl_divergence.mean()
+        return z, kl_loss
+    def forward(self, x_s, x_i, x_a):
+        """
+        Forward pass `input_img` through the network
+        """
+        # reconstruction:
+        # encode:
+        # input reshape:
+        x_s = x_s.reshape(-1, 1, POINTS)
+        x_i = x_i.reshape(-1, 1, POINTS)
+        x_a = x_a.reshape(-1, 1, POINTS)
+        # concatenate along channel axis
+        x = torch.cat([x_s, x_i, x_a], dim=1)
+        # encode:
+        z_mu, z_log_sd = self._encoder(x)
+        # get the latent vector through reparameterization:
+        z, kl_loss = self.vae_reparameterize(z_mu, z_log_sd)
+        # decode:
+        # reshape:
+        z = z.reshape(-1, 1, 270)
+        x_d = self._decoder_z_mu(z)
+        semantic_channels = self._decoder(x_d)
+        # semantic grid: 10 channels
+        semantic_scan = self.softmax(semantic_channels)
+        return semantic_scan, semantic_channels, kl_loss
+#
+# end of class
+#
+# end of file

scripts/train.py ADDED Viewed

	@@ -0,0 +1,380 @@

+#!/usr/bin/env python
+#
+# file: $ISIP_EXP/SOGMP/scripts/train.py
+#
+# revision history: xzt
+#  20220824 (TE): first version
+#
+# usage:
+#  python train.py mdir train_data val_data
+#
+# arguments:
+#  mdir: the directory where the output model is stored
+#  train_data: the directory of training data
+#  val_data: the directory of valiation data
+#
+# This script trains a S3-Net model
+#------------------------------------------------------------------------------
+# import pytorch modules
+#
+import torch
+import torch.nn as nn
+from torch.optim import Adam
+from tqdm import tqdm
+import torch.nn.functional as F
+# visualize:
+from tensorboardX import SummaryWriter
+import numpy as np
+# import the model and all of its variables/functions
+#
+from model import *
+import lovasz_losses as L
+# import modules
+#
+import sys
+import os
+#-----------------------------------------------------------------------------
+#
+# global variables are listed here
+#
+#-----------------------------------------------------------------------------
+# general global values
+#
+model_dir = './model/s3_net_model.pth'  # the path of model storage
+NUM_ARGS = 3
+NUM_EPOCHS = 20000
+BATCH_SIZE = 1024
+LEARNING_RATE = "lr"
+BETAS = "betas"
+EPS = "eps"
+WEIGHT_DECAY = "weight_decay"
+# Constants
+NUM_INPUT_CHANNELS = 3
+NUM_OUTPUT_CHANNELS = 10 # 9 classes of semantic labels + 1 background
+BETA = 0.01
+# for reproducibility, we seed the rng
+#
+set_seed(SEED1)
+# adjust_learning_rate
+#
+def adjust_learning_rate(optimizer, epoch):
+    lr = 1e-4
+    if epoch > 50000:
+        lr = 2e-5
+    if epoch > 480000:
+       # lr = 5e-8
+       lr = lr * (0.1 ** (epoch // 110000))
+    #  if epoch > 8300:
+    #      lr = 1e-9
+    for param_group in optimizer.param_groups:
+        param_group['lr'] = lr
+# train function:
+def train(model, dataloader, dataset, device, optimizer, ce_criterion, lovasz_criterion, class_weights, epoch, epochs):
+    # set model to training mode:
+    model.train()
+    # for each batch in increments of batch size:
+    running_loss = 0.0
+    # kl_divergence:
+    kl_avg_loss = 0.0
+    # CE loss:
+    ce_avg_loss = 0.0
+    counter = 0
+    # get the number of batches (ceiling of train_data/batch_size):
+    num_batches = int(len(dataset)/dataloader.batch_size)
+    for i, batch in tqdm(enumerate(dataloader), total=num_batches):
+    #for i, batch in enumerate(dataloader, 0):
+        counter += 1
+        # collect the samples as a batch:
+        scans = batch['scan']
+        scans = scans.to(device)
+        intensities = batch['intensity']
+        intensities = intensities.to(device)
+        angle_incidence = batch['angle_incidence']
+        angle_incidence = angle_incidence.to(device)
+        labels = batch['label']
+        labels = labels.to(device)
+        batch_size = scans.size(0)
+        # set all gradients to 0:
+        optimizer.zero_grad()
+        # feed the batch to the network:
+        semantic_scan, semantic_channels, kl_loss = model(scans, intensities, angle_incidence)
+        # calculate the semantic ce loss:
+        ce_loss = ce_criterion(semantic_channels, labels.to(torch.long)).div(batch_size)
+        lovasz_loss, _ = lovasz_criterion(semantic_channels, labels.to(torch.long))
+        lovasz_loss = lovasz_loss.mul(class_weights.to("cuda")).sum()
+        # beta-vae:
+        loss = ce_loss + BETA*kl_loss + lovasz_loss
+        # perform back propagation:
+        loss.backward(torch.ones_like(loss))
+        optimizer.step()
+        # get the loss:
+        # multiple GPUs:
+        if torch.cuda.device_count() > 1:
+            loss = loss.mean()
+            ce_loss = ce_loss.mean()
+            kl_loss = lovasz_loss.mean() #kl_loss.mean()
+        running_loss += loss.item()
+        # kl_divergence:
+        kl_avg_loss += lovasz_loss.item() #kl_loss.item()
+        # CE loss:
+        ce_avg_loss += ce_loss.item()
+        # display informational message:
+        if(i % 512 == 0):
+            print('Epoch [{}/{}], Step[{}/{}], Loss: {:.4f}, CE_Loss: {:.4f}, Lovasz_Loss: {:.4f}'
+                    .format(epoch, epochs, i + 1, num_batches, loss.item(), ce_loss.item(), lovasz_loss.item()))
+    train_loss = running_loss / counter
+    train_kl_loss = kl_avg_loss / counter
+    train_ce_loss = ce_avg_loss / counter
+    return train_loss, train_kl_loss, train_ce_loss
+# validate function:
+def validate(model, dataloader, dataset, device, ce_criterion, lovasz_criterion, class_weights):
+    # set model to evaluation mode:
+    model.eval()
+    # for each batch in increments of batch size:
+    running_loss = 0.0
+    # kl_divergence:
+    kl_avg_loss = 0.0
+    # CE loss:
+    ce_avg_loss = 0.0
+    counter = 0
+    # get the number of batches (ceiling of train_data/batch_size):
+    num_batches = int(len(dataset)/dataloader.batch_size)
+    with torch.no_grad():
+        for i, batch in tqdm(enumerate(dataloader), total=num_batches):
+        #for i, batch in enumerate(dataloader, 0):
+            counter += 1
+            # collect the samples as a batch:
+            scans = batch['scan']
+            scans = scans.to(device)
+            intensities = batch['intensity']
+            intensities = intensities.to(device)
+            angle_incidence = batch['angle_incidence']
+            angle_incidence = angle_incidence.to(device)
+            labels = batch['label']
+            labels = labels.to(device)
+            batch_size = scans.size(0)
+            # feed the batch to the network:
+            semantic_scan, semantic_channels, kl_loss = model(scans, intensities, angle_incidence)
+            # calculate the semantic ce loss:
+            ce_loss = ce_criterion(semantic_channels, labels.to(torch.long)).div(batch_size)
+            lovasz_loss, _ = lovasz_criterion(semantic_channels, labels.to(torch.long))
+            lovasz_loss = lovasz_loss.mul(class_weights.to("cuda")).sum()
+            # beta-vae:
+            loss = ce_loss + BETA*kl_loss + lovasz_loss
+            # multiple GPUs:
+            if torch.cuda.device_count() > 1:
+                loss = loss.mean()
+                ce_loss = ce_loss.mean()
+                kl_loss = lovasz_loss.mean() #kl_loss.mean()
+            running_loss += loss.item()
+            # kl_divergence:
+            kl_avg_loss += lovasz_loss.item() #kl_loss.item()
+            # CE loss:
+            ce_avg_loss += ce_loss.item()
+    val_loss = running_loss / counter
+    val_kl_loss = kl_avg_loss / counter
+    val_ce_loss = ce_avg_loss / counter
+    return val_loss, val_kl_loss, val_ce_loss
+#------------------------------------------------------------------------------
+#
+# the main program starts here
+#
+#------------------------------------------------------------------------------
+# function: main
+#
+# arguments: none
+#
+# return: none
+#
+# This method is the main function.
+#
+def main(argv):
+    # ensure we have the correct amount of arguments:
+    #global cur_batch_win
+    if(len(argv) != NUM_ARGS):
+        print("usage: python train.py [MDL_PATH] [TRAIN_PATH] [DEV_PATH] [TRAIN_MASK_PATH] [DEV_MASK_PATH]")
+        exit(-1)
+    # define local variables:
+    mdl_path = argv[0]
+    pTrain = argv[1]
+    pDev = argv[2]
+    # get the output directory name:
+    odir = os.path.dirname(mdl_path)
+    # if the odir doesn't exits, we make it:
+    if not os.path.exists(odir):
+        os.makedirs(odir)
+    # set the device to use GPU if available:
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    print('...Start reading data...')
+    ### training data ###
+    # training set and training data loader
+    train_dataset = VaeTestDataset(pTrain, 'train')
+    train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, num_workers=4, \
+                                                   shuffle=True, drop_last=True, pin_memory=True)
+    ### validation data ###
+    # validation set and validation data loader
+    dev_dataset = VaeTestDataset(pDev, 'dev')
+    dev_dataloader = torch.utils.data.DataLoader(dev_dataset, batch_size=BATCH_SIZE, num_workers=2, \
+                                                 shuffle=True, drop_last=True, pin_memory=True)
+    # calculate the class weights:
+    class_weights = np.array([2.514399, 1.4917144, 0.51608694, 0.659483, 1.0900991, 1.6461798, 0.32852992, 1.5633508, 0.9236576, 0.10251398])  # median frequency balance
+    #class_weights = np.array([1.4222778, 2.1834621, 40.17538]) # inverse log class_probability
+    class_weights = torch.Tensor(class_weights)
+    print("class weights: ", class_weights)
+    class_weights.to(device)
+    print('...Finish reading data...')
+    # instantiate a model:
+    model = S3Net(input_channels=NUM_INPUT_CHANNELS,
+                 output_channels=NUM_OUTPUT_CHANNELS)
+    # moves the model to device (cpu in our case so no change):
+    model.to(device)
+    # set the adam optimizer parameters:
+    opt_params = { LEARNING_RATE: 0.001,
+                   BETAS: (.9,0.999),
+                   EPS: 1e-08,
+                   WEIGHT_DECAY: .001 }
+    # set the loss criterion and optimizer:
+    ce_criterion = nn.CrossEntropyLoss(reduction='sum', weight=class_weights)
+    ce_criterion.to(device)
+    lovasz_criterion = L.LovaszSoftmax(reduction='sum', ignore_index=0)
+    lovasz_criterion.to(device)
+    # create an optimizer, and pass the model params to it:
+    optimizer = Adam(model.parameters(), **opt_params)
+    # get the number of epochs to train on:
+    epochs = NUM_EPOCHS
+    # if there are trained models, continue training:
+    if os.path.exists(mdl_path):
+        checkpoint = torch.load(mdl_path)
+        model.load_state_dict(checkpoint['model'])
+        optimizer.load_state_dict(checkpoint['optimizer'])
+        start_epoch = checkpoint['epoch']
+        print('Load epoch {} success'.format(start_epoch))
+    else:
+        start_epoch = 0
+        #pre_path = "./model/model_segnet_weight.pth"
+        #pretrained_model = torch.load(pre_path)
+        #model.load_state_dict(pretrained_model['model'])
+        print('No trained models, restart training')
+    # multiple GPUs:
+    if torch.cuda.device_count() > 1:
+        print("Let's use 2 of total", torch.cuda.device_count(), "GPUs!")
+        # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
+        model = nn.DataParallel(model) #, device_ids=[0, 1])
+    # moves the model to device (cpu in our case so no change):
+    model.to(device)
+    # tensorboard writer:
+    writer = SummaryWriter('runs')
+    epoch_num = 0
+    for epoch in range(start_epoch+1, epochs):
+        # adjust learning rate:
+        adjust_learning_rate(optimizer, epoch)
+        ################################## Train #####################################
+        # for each batch in increments of batch size
+        #
+        train_epoch_loss, train_kl_epoch_loss, train_ce_epoch_loss = train(
+            model, train_dataloader, train_dataset, device, optimizer, ce_criterion, lovasz_criterion, class_weights, epoch, epochs
+        )
+        valid_epoch_loss, valid_kl_epoch_loss, valid_ce_epoch_loss = validate(
+            model, dev_dataloader, dev_dataset, device, ce_criterion, lovasz_criterion, class_weights
+        )
+        # log the epoch loss
+        writer.add_scalar('training loss',
+                        train_epoch_loss,
+                        epoch)
+        writer.add_scalar('training kl loss',
+                        train_kl_epoch_loss,
+                        epoch)
+        writer.add_scalar('training ce loss',
+                train_ce_epoch_loss,
+                epoch)
+        writer.add_scalar('validation loss',
+                        valid_epoch_loss,
+                        epoch)
+        writer.add_scalar('validation kl loss',
+                        valid_kl_epoch_loss,
+                        epoch)
+        writer.add_scalar('validation ce loss',
+                        valid_ce_epoch_loss,
+                        epoch)
+        print('Train set: Average loss: {:.4f}'.format(train_epoch_loss))
+        print('Validation set: Average loss: {:.4f}'.format(valid_epoch_loss))
+        # save the model:
+        if(epoch % 2000 == 0):
+            if torch.cuda.device_count() > 1: # multiple GPUS:
+                state = {'model':model.module.state_dict(), 'optimizer':optimizer.state_dict(), 'epoch':epoch}
+            else:
+                state = {'model':model.state_dict(), 'optimizer':optimizer.state_dict(), 'epoch':epoch}
+            path='./model/model' + str(epoch) +'.pth'
+            torch.save(state, path)
+        epoch_num = epoch
+    # save the final model
+    if torch.cuda.device_count() > 1: # multiple GPUS:
+        state = {'model':model.module.state_dict(), 'optimizer':optimizer.state_dict(), 'epoch':epoch_num}
+    else:
+        state = {'model':model.state_dict(), 'optimizer':optimizer.state_dict(), 'epoch':epoch_num}
+    torch.save(state, mdl_path)
+    # exit gracefully
+    #
+    return True
+#
+# end of function
+# begin gracefully
+#
+if __name__ == '__main__':
+    main(sys.argv[1:])
+#
+# end of file