# Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone S³-Net implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899). Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY). The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696. ## Related Resources - **Dataset Download:** https://doi.org/10.5281/zenodo.18350696 - **SALSA (Dataset and Labeling Framework):** https://github.com/TempleRAIL/semantic2d - **S³-Net (Stochastic Semantic Segmentation):** https://github.com/TempleRAIL/s3_net - **Semantic CNN Navigation:** https://github.com/TempleRAIL/semantic_cnn_nav ## S³-Net: Stochastic Semantic Segmentation Network [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) S³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point. ## Demo Results **S³-Net Segmentation** ![S³-Net Segmentation](./demo/1.lobby_s3net_segmentation.gif) **Semantic Mapping** ![Semantic Mapping](./demo/2.lobby_semantic_mapping.gif) **Semantic Navigation** ![Semantic Navigation](./demo/3.lobby_semantic_navigation.gif) ## Model Architecture S³-Net uses an encoder-decoder architecture with stochastic latent representations: ``` Input (3 channels: scan, intensity, angle of incidence) │ ▼ ┌─────────────────────────────────────┐ │ Encoder (Conv1D + Residual Blocks) │ │ - Conv1D (3 → 32) stride=2 │ │ - Conv1D (32 → 64) stride=2 │ │ - Residual Stack (2 layers) │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ VAE Reparameterization │ │ - μ (mean) and σ (std) estimation │ │ - Latent sampling z ~ N(μ, σ²) │ │ - Monte Carlo KL divergence │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ Decoder (Residual + TransposeConv) │ │ - Residual Stack (2 layers) │ │ - TransposeConv1D (64 → 32) │ │ - TransposeConv1D (32 → 10) │ │ - Softmax (10 semantic classes) │ └─────────────────────────────────────┘ │ ▼ Output (10 channels: semantic probabilities) ``` **Key Features:** - **3 Input Channels:** Range scan, intensity, angle of incidence - **10 Output Classes:** Background + 9 semantic classes - **Stochastic Inference:** Multiple forward passes enable uncertainty estimation via majority voting - **Loss Function:** Cross-Entropy + Lovasz-Softmax + β-VAE KL divergence ## Semantic Classes | ID | Class | Description | |----|------------|--------------------------------| | 0 | Other | Background/unknown | | 1 | Chair | Office and lounge chairs | | 2 | Door | Doors (open/closed) | | 3 | Elevator | Elevator doors | | 4 | Person | Dynamic pedestrians | | 5 | Pillar | Structural pillars/columns | | 6 | Sofa | Sofas and couches | | 7 | Table | Tables of all types | | 8 | Trash bin | Waste receptacles | | 9 | Wall | Walls and flat surfaces | ## Requirements - Python 3.7+ - PyTorch 1.7.1+ - TensorBoard - NumPy - Matplotlib - tqdm Install dependencies: ```bash pip install torch torchvision tensorboardX numpy matplotlib tqdm ``` ## Dataset Structure S³-Net expects the Semantic2D dataset organized as follows: ``` ~/semantic2d_data/ ├── dataset.txt # List of dataset folders ├── 2024-04-11-15-24-29/ # Dataset folder 1 │ ├── train.txt # Training sample list │ ├── dev.txt # Validation sample list │ ├── scans_lidar/ # Range scans (.npy) │ ├── intensities_lidar/ # Intensity data (.npy) │ └── semantic_label/ # Ground truth labels (.npy) ├── 2024-04-04-12-16-41/ # Dataset folder 2 │ └── ... └── ... ``` **dataset.txt format:** ``` 2024-04-11-15-24-29 2024-04-04-12-16-41 ``` ## Usage ### Training Train S³-Net on your dataset: ```bash sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/ ``` **Arguments:** - `$1` - Training data directory (contains `dataset.txt` and subfolders) - `$2` - Validation data directory **Training Configuration** (in `scripts/train.py`): | Parameter | Default | Description | |-----------|---------|-------------| | `NUM_EPOCHS` | 20000 | Total training epochs | | `BATCH_SIZE` | 1024 | Samples per batch | | `LEARNING_RATE` | 0.001 | Initial learning rate | | `BETA` | 0.01 | β-VAE weight for KL divergence | **Learning Rate Schedule:** - Epochs 0-50000: `1e-4` - Epochs 50000-480000: `2e-5` - Epochs 480000+: Exponential decay The model saves checkpoints every 2000 epochs to `./model/`. ### Inference Demo Run semantic segmentation on test data: ```bash sh run_eval_demo.sh ~/semantic2d_data/ ``` **Arguments:** - `$1` - Test data directory (reads `dev.txt` for sample list) **Output:** - `./output/semantic_ground_truth_*.png` - Ground truth visualizations - `./output/semantic_s3net_*.png` - S³-Net predictions **Example Output:** | Ground Truth | S³-Net Prediction | |:------------:|:-----------------:| | ![Ground Truth](./output/semantic_ground_truth_7000.png) | ![S³-Net Prediction](./output/semantic_s3net_7000.png) | ### Stochastic Inference S³-Net performs **32 stochastic forward passes** per sample and uses **majority voting** to determine the final prediction. This provides: - More robust predictions - Implicit uncertainty estimation - Reduced noise in segmentation boundaries ## File Structure ``` s3_net/ ├── demo/ # Demo GIFs │ ├── 1.lobby_s3net_segmentation.gif │ ├── 2.lobby_semantic_mapping.gif │ └── 3.lobby_semantic_navigation.gif ├── model/ │ └── s3_net_model.pth # Pretrained model weights ├── output/ # Inference output directory ├── scripts/ │ ├── model.py # S³-Net model architecture │ ├── train.py # Training script │ ├── decode_demo.py # Inference/demo script │ └── lovasz_losses.py # Lovasz-Softmax loss function ├── run_train.sh # Training driver script ├── run_eval_demo.sh # Inference driver script ├── LICENSE # MIT License └── README.md # This file ``` ## TensorBoard Monitoring Training logs are saved to `./runs/`. View training progress: ```bash tensorboard --logdir=runs ``` Monitored metrics: - Training/Validation loss - Cross-Entropy loss - Lovasz-Softmax loss ## Pre-trained Model A pre-trained model is included at `model/s3_net_model.pth`. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor. To use the pre-trained model: ```bash sh run_eval_demo.sh ~/semantic2d_data/ ``` ## Citation ```bibtex @article{xie2026semantic2d, title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone}, author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip}, journal={arXiv preprint arXiv:2409.09899}, year={2026} } ```