| # Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone | |
| SΒ³-Net implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899). | |
| Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY). | |
| The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696. | |
| ## Related Resources | |
| - **Dataset Download:** https://doi.org/10.5281/zenodo.18350696 | |
| - **SALSA (Dataset and Labeling Framework):** https://github.com/TempleRAIL/semantic2d | |
| - **SΒ³-Net (Stochastic Semantic Segmentation):** https://github.com/TempleRAIL/s3_net | |
| - **Semantic CNN Navigation:** https://github.com/TempleRAIL/semantic_cnn_nav | |
| ## SΒ³-Net: Stochastic Semantic Segmentation Network | |
| [](https://opensource.org/licenses/MIT) | |
| SΒ³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point. | |
| ## Demo Results | |
| **SΒ³-Net Segmentation** | |
|  | |
| **Semantic Mapping** | |
|  | |
| **Semantic Navigation** | |
|  | |
| ## Model Architecture | |
| SΒ³-Net uses an encoder-decoder architecture with stochastic latent representations: | |
| ``` | |
| Input (3 channels: scan, intensity, angle of incidence) | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β Encoder (Conv1D + Residual Blocks) β | |
| β - Conv1D (3 β 32) stride=2 β | |
| β - Conv1D (32 β 64) stride=2 β | |
| β - Residual Stack (2 layers) β | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β VAE Reparameterization β | |
| β - ΞΌ (mean) and Ο (std) estimation β | |
| β - Latent sampling z ~ N(ΞΌ, ΟΒ²) β | |
| β - Monte Carlo KL divergence β | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β Decoder (Residual + TransposeConv) β | |
| β - Residual Stack (2 layers) β | |
| β - TransposeConv1D (64 β 32) β | |
| β - TransposeConv1D (32 β 10) β | |
| β - Softmax (10 semantic classes) β | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| Output (10 channels: semantic probabilities) | |
| ``` | |
| **Key Features:** | |
| - **3 Input Channels:** Range scan, intensity, angle of incidence | |
| - **10 Output Classes:** Background + 9 semantic classes | |
| - **Stochastic Inference:** Multiple forward passes enable uncertainty estimation via majority voting | |
| - **Loss Function:** Cross-Entropy + Lovasz-Softmax + Ξ²-VAE KL divergence | |
| ## Semantic Classes | |
| | ID | Class | Description | | |
| |----|------------|--------------------------------| | |
| | 0 | Other | Background/unknown | | |
| | 1 | Chair | Office and lounge chairs | | |
| | 2 | Door | Doors (open/closed) | | |
| | 3 | Elevator | Elevator doors | | |
| | 4 | Person | Dynamic pedestrians | | |
| | 5 | Pillar | Structural pillars/columns | | |
| | 6 | Sofa | Sofas and couches | | |
| | 7 | Table | Tables of all types | | |
| | 8 | Trash bin | Waste receptacles | | |
| | 9 | Wall | Walls and flat surfaces | | |
| ## Requirements | |
| - Python 3.7+ | |
| - PyTorch 1.7.1+ | |
| - TensorBoard | |
| - NumPy | |
| - Matplotlib | |
| - tqdm | |
| Install dependencies: | |
| ```bash | |
| pip install torch torchvision tensorboardX numpy matplotlib tqdm | |
| ``` | |
| ## Dataset Structure | |
| SΒ³-Net expects the Semantic2D dataset organized as follows: | |
| ``` | |
| ~/semantic2d_data/ | |
| βββ dataset.txt # List of dataset folders | |
| βββ 2024-04-11-15-24-29/ # Dataset folder 1 | |
| β βββ train.txt # Training sample list | |
| β βββ dev.txt # Validation sample list | |
| β βββ scans_lidar/ # Range scans (.npy) | |
| β βββ intensities_lidar/ # Intensity data (.npy) | |
| β βββ semantic_label/ # Ground truth labels (.npy) | |
| βββ 2024-04-04-12-16-41/ # Dataset folder 2 | |
| β βββ ... | |
| βββ ... | |
| ``` | |
| **dataset.txt format:** | |
| ``` | |
| 2024-04-11-15-24-29 | |
| 2024-04-04-12-16-41 | |
| ``` | |
| ## Usage | |
| ### Training | |
| Train SΒ³-Net on your dataset: | |
| ```bash | |
| sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/ | |
| ``` | |
| **Arguments:** | |
| - `$1` - Training data directory (contains `dataset.txt` and subfolders) | |
| - `$2` - Validation data directory | |
| **Training Configuration** (in `scripts/train.py`): | |
| | Parameter | Default | Description | | |
| |-----------|---------|-------------| | |
| | `NUM_EPOCHS` | 20000 | Total training epochs | | |
| | `BATCH_SIZE` | 1024 | Samples per batch | | |
| | `LEARNING_RATE` | 0.001 | Initial learning rate | | |
| | `BETA` | 0.01 | Ξ²-VAE weight for KL divergence | | |
| **Learning Rate Schedule:** | |
| - Epochs 0-50000: `1e-4` | |
| - Epochs 50000-480000: `2e-5` | |
| - Epochs 480000+: Exponential decay | |
| The model saves checkpoints every 2000 epochs to `./model/`. | |
| ### Inference Demo | |
| Run semantic segmentation on test data: | |
| ```bash | |
| sh run_eval_demo.sh ~/semantic2d_data/ | |
| ``` | |
| **Arguments:** | |
| - `$1` - Test data directory (reads `dev.txt` for sample list) | |
| **Output:** | |
| - `./output/semantic_ground_truth_*.png` - Ground truth visualizations | |
| - `./output/semantic_s3net_*.png` - SΒ³-Net predictions | |
| **Example Output:** | |
| | Ground Truth | SΒ³-Net Prediction | | |
| |:------------:|:-----------------:| | |
| |  |  | | |
| ### Stochastic Inference | |
| SΒ³-Net performs **32 stochastic forward passes** per sample and uses **majority voting** to determine the final prediction. This provides: | |
| - More robust predictions | |
| - Implicit uncertainty estimation | |
| - Reduced noise in segmentation boundaries | |
| ## File Structure | |
| ``` | |
| s3_net/ | |
| βββ demo/ # Demo GIFs | |
| β βββ 1.lobby_s3net_segmentation.gif | |
| β βββ 2.lobby_semantic_mapping.gif | |
| β βββ 3.lobby_semantic_navigation.gif | |
| βββ model/ | |
| β βββ s3_net_model.pth # Pretrained model weights | |
| βββ output/ # Inference output directory | |
| βββ scripts/ | |
| β βββ model.py # SΒ³-Net model architecture | |
| β βββ train.py # Training script | |
| β βββ decode_demo.py # Inference/demo script | |
| β βββ lovasz_losses.py # Lovasz-Softmax loss function | |
| βββ run_train.sh # Training driver script | |
| βββ run_eval_demo.sh # Inference driver script | |
| βββ LICENSE # MIT License | |
| βββ README.md # This file | |
| ``` | |
| ## TensorBoard Monitoring | |
| Training logs are saved to `./runs/`. View training progress: | |
| ```bash | |
| tensorboard --logdir=runs | |
| ``` | |
| Monitored metrics: | |
| - Training/Validation loss | |
| - Cross-Entropy loss | |
| - Lovasz-Softmax loss | |
| ## Pre-trained Model | |
| A pre-trained model is included at `model/s3_net_model.pth`. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor. | |
| To use the pre-trained model: | |
| ```bash | |
| sh run_eval_demo.sh ~/semantic2d_data/ | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @article{xie2026semantic2d, | |
| title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone}, | |
| author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip}, | |
| journal={arXiv preprint arXiv:2409.09899}, | |
| year={2026} | |
| } | |
| ``` | |