Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone
SΒ³-Net implementation code for our paper "Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone". Video demos can be found at multimedia demonstrations. The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.
Related Resources
- Dataset Download: https://doi.org/10.5281/zenodo.18350696
- SALSA (Dataset and Labeling Framework): https://github.com/TempleRAIL/semantic2d
- SΒ³-Net (Stochastic Semantic Segmentation): https://github.com/TempleRAIL/s3_net
- Semantic CNN Navigation: https://github.com/TempleRAIL/semantic_cnn_nav
SΒ³-Net: Stochastic Semantic Segmentation Network
SΒ³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point.
Demo Results
Model Architecture
SΒ³-Net uses an encoder-decoder architecture with stochastic latent representations:
Input (3 channels: scan, intensity, angle of incidence)
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Encoder (Conv1D + Residual Blocks) β
β - Conv1D (3 β 32) stride=2 β
β - Conv1D (32 β 64) stride=2 β
β - Residual Stack (2 layers) β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β VAE Reparameterization β
β - ΞΌ (mean) and Ο (std) estimation β
β - Latent sampling z ~ N(ΞΌ, ΟΒ²) β
β - Monte Carlo KL divergence β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Decoder (Residual + TransposeConv) β
β - Residual Stack (2 layers) β
β - TransposeConv1D (64 β 32) β
β - TransposeConv1D (32 β 10) β
β - Softmax (10 semantic classes) β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
Output (10 channels: semantic probabilities)
Key Features:
- 3 Input Channels: Range scan, intensity, angle of incidence
- 10 Output Classes: Background + 9 semantic classes
- Stochastic Inference: Multiple forward passes enable uncertainty estimation via majority voting
- Loss Function: Cross-Entropy + Lovasz-Softmax + Ξ²-VAE KL divergence
Semantic Classes
| ID | Class | Description |
|---|---|---|
| 0 | Other | Background/unknown |
| 1 | Chair | Office and lounge chairs |
| 2 | Door | Doors (open/closed) |
| 3 | Elevator | Elevator doors |
| 4 | Person | Dynamic pedestrians |
| 5 | Pillar | Structural pillars/columns |
| 6 | Sofa | Sofas and couches |
| 7 | Table | Tables of all types |
| 8 | Trash bin | Waste receptacles |
| 9 | Wall | Walls and flat surfaces |
Requirements
- Python 3.7+
- PyTorch 1.7.1+
- TensorBoard
- NumPy
- Matplotlib
- tqdm
Install dependencies:
pip install torch torchvision tensorboardX numpy matplotlib tqdm
Dataset Structure
SΒ³-Net expects the Semantic2D dataset organized as follows:
~/semantic2d_data/
βββ dataset.txt # List of dataset folders
βββ 2024-04-11-15-24-29/ # Dataset folder 1
β βββ train.txt # Training sample list
β βββ dev.txt # Validation sample list
β βββ scans_lidar/ # Range scans (.npy)
β βββ intensities_lidar/ # Intensity data (.npy)
β βββ semantic_label/ # Ground truth labels (.npy)
βββ 2024-04-04-12-16-41/ # Dataset folder 2
β βββ ...
βββ ...
dataset.txt format:
2024-04-11-15-24-29
2024-04-04-12-16-41
Usage
Training
Train SΒ³-Net on your dataset:
sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
Arguments:
$1- Training data directory (containsdataset.txtand subfolders)$2- Validation data directory
Training Configuration (in scripts/train.py):
| Parameter | Default | Description |
|---|---|---|
NUM_EPOCHS |
20000 | Total training epochs |
BATCH_SIZE |
1024 | Samples per batch |
LEARNING_RATE |
0.001 | Initial learning rate |
BETA |
0.01 | Ξ²-VAE weight for KL divergence |
Learning Rate Schedule:
- Epochs 0-50000:
1e-4 - Epochs 50000-480000:
2e-5 - Epochs 480000+: Exponential decay
The model saves checkpoints every 2000 epochs to ./model/.
Inference Demo
Run semantic segmentation on test data:
sh run_eval_demo.sh ~/semantic2d_data/
Arguments:
$1- Test data directory (readsdev.txtfor sample list)
Output:
./output/semantic_ground_truth_*.png- Ground truth visualizations./output/semantic_s3net_*.png- SΒ³-Net predictions
Example Output:
Stochastic Inference
SΒ³-Net performs 32 stochastic forward passes per sample and uses majority voting to determine the final prediction. This provides:
- More robust predictions
- Implicit uncertainty estimation
- Reduced noise in segmentation boundaries
File Structure
s3_net/
βββ demo/ # Demo GIFs
β βββ 1.lobby_s3net_segmentation.gif
β βββ 2.lobby_semantic_mapping.gif
β βββ 3.lobby_semantic_navigation.gif
βββ model/
β βββ s3_net_model.pth # Pretrained model weights
βββ output/ # Inference output directory
βββ scripts/
β βββ model.py # SΒ³-Net model architecture
β βββ train.py # Training script
β βββ decode_demo.py # Inference/demo script
β βββ lovasz_losses.py # Lovasz-Softmax loss function
βββ run_train.sh # Training driver script
βββ run_eval_demo.sh # Inference driver script
βββ LICENSE # MIT License
βββ README.md # This file
TensorBoard Monitoring
Training logs are saved to ./runs/. View training progress:
tensorboard --logdir=runs
Monitored metrics:
- Training/Validation loss
- Cross-Entropy loss
- Lovasz-Softmax loss
Pre-trained Model
A pre-trained model is included at model/s3_net_model.pth. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor.
To use the pre-trained model:
sh run_eval_demo.sh ~/semantic2d_data/
Citation
@article{xie2026semantic2d,
title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
journal={arXiv preprint arXiv:2409.09899},
year={2026}
}




