s3_net / README.md

Upload folder using huggingface_hub

d9c5371 verified 2 days ago

8.16 kB

	# Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

	S³-Net implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899).
	Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY).
	The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.

	## Related Resources

	- Dataset Download: https://doi.org/10.5281/zenodo.18350696
	- SALSA (Dataset and Labeling Framework): https://github.com/TempleRAIL/semantic2d
	- S³-Net (Stochastic Semantic Segmentation): https://github.com/TempleRAIL/s3_net
	- Semantic CNN Navigation: https://github.com/TempleRAIL/semantic_cnn_nav

	## S³-Net: Stochastic Semantic Segmentation Network

	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

	S³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point.

	## Demo Results

	S³-Net Segmentation
	![S³-Net Segmentation](./demo/1.lobby_s3net_segmentation.gif)

	Semantic Mapping
	![Semantic Mapping](./demo/2.lobby_semantic_mapping.gif)

	Semantic Navigation
	![Semantic Navigation](./demo/3.lobby_semantic_navigation.gif)

	## Model Architecture

	S³-Net uses an encoder-decoder architecture with stochastic latent representations:

	```
	Input (3 channels: scan, intensity, angle of incidence)
	│
	▼
	┌─────────────────────────────────────┐
	│ Encoder (Conv1D + Residual Blocks) │
	│ - Conv1D (3 → 32) stride=2 │
	│ - Conv1D (32 → 64) stride=2 │
	│ - Residual Stack (2 layers) │
	└─────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────┐
	│ VAE Reparameterization │
	│ - μ (mean) and σ (std) estimation │
	│ - Latent sampling z ~ N(μ, σ²) │
	│ - Monte Carlo KL divergence │
	└─────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────┐
	│ Decoder (Residual + TransposeConv) │
	│ - Residual Stack (2 layers) │
	│ - TransposeConv1D (64 → 32) │
	│ - TransposeConv1D (32 → 10) │
	│ - Softmax (10 semantic classes) │
	└─────────────────────────────────────┘
	│
	▼
	Output (10 channels: semantic probabilities)
	```

	Key Features:
	- 3 Input Channels: Range scan, intensity, angle of incidence
	- 10 Output Classes: Background + 9 semantic classes
	- Stochastic Inference: Multiple forward passes enable uncertainty estimation via majority voting
	- Loss Function: Cross-Entropy + Lovasz-Softmax + β-VAE KL divergence

	## Semantic Classes

	\| ID \| Class \| Description \|
	\|----\|------------\|--------------------------------\|
	\| 0 \| Other \| Background/unknown \|
	\| 1 \| Chair \| Office and lounge chairs \|
	\| 2 \| Door \| Doors (open/closed) \|
	\| 3 \| Elevator \| Elevator doors \|
	\| 4 \| Person \| Dynamic pedestrians \|
	\| 5 \| Pillar \| Structural pillars/columns \|
	\| 6 \| Sofa \| Sofas and couches \|
	\| 7 \| Table \| Tables of all types \|
	\| 8 \| Trash bin \| Waste receptacles \|
	\| 9 \| Wall \| Walls and flat surfaces \|

	## Requirements

	- Python 3.7+
	- PyTorch 1.7.1+
	- TensorBoard
	- NumPy
	- Matplotlib
	- tqdm

	Install dependencies:
	```bash
	pip install torch torchvision tensorboardX numpy matplotlib tqdm
	```

	## Dataset Structure

	S³-Net expects the Semantic2D dataset organized as follows:

	```
	~/semantic2d_data/
	├── dataset.txt # List of dataset folders
	├── 2024-04-11-15-24-29/ # Dataset folder 1
	│ ├── train.txt # Training sample list
	│ ├── dev.txt # Validation sample list
	│ ├── scans_lidar/ # Range scans (.npy)
	│ ├── intensities_lidar/ # Intensity data (.npy)
	│ └── semantic_label/ # Ground truth labels (.npy)
	├── 2024-04-04-12-16-41/ # Dataset folder 2
	│ └── ...
	└── ...
	```

	dataset.txt format:
	```
	2024-04-11-15-24-29
	2024-04-04-12-16-41
	```

	## Usage

	### Training

	Train S³-Net on your dataset:

	```bash
	sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
	```

	Arguments:
	- `$1` - Training data directory (contains `dataset.txt` and subfolders)
	- `$2` - Validation data directory

	Training Configuration (in `scripts/train.py`):

	\| Parameter \| Default \| Description \|
	\|-----------\|---------\|-------------\|
	\| `NUM_EPOCHS` \| 20000 \| Total training epochs \|
	\| `BATCH_SIZE` \| 1024 \| Samples per batch \|
	\| `LEARNING_RATE` \| 0.001 \| Initial learning rate \|
	\| `BETA` \| 0.01 \| β-VAE weight for KL divergence \|

	Learning Rate Schedule:
	- Epochs 0-50000: `1e-4`
	- Epochs 50000-480000: `2e-5`
	- Epochs 480000+: Exponential decay

	The model saves checkpoints every 2000 epochs to `./model/`.

	### Inference Demo

	Run semantic segmentation on test data:

	```bash
	sh run_eval_demo.sh ~/semantic2d_data/
	```

	Arguments:
	- `$1` - Test data directory (reads `dev.txt` for sample list)

	Output:
	- `./output/semantic_ground_truth_*.png` - Ground truth visualizations
	- `./output/semantic_s3net_*.png` - S³-Net predictions

	Example Output:

	\| Ground Truth \| S³-Net Prediction \|
	\|:------------:\|:-----------------:\|
	\| ![Ground Truth](./output/semantic_ground_truth_7000.png) \| ![S³-Net Prediction](./output/semantic_s3net_7000.png) \|

	### Stochastic Inference

	S³-Net performs 32 stochastic forward passes per sample and uses majority voting to determine the final prediction. This provides:
	- More robust predictions
	- Implicit uncertainty estimation
	- Reduced noise in segmentation boundaries

	## File Structure

	```
	s3_net/
	├── demo/ # Demo GIFs
	│ ├── 1.lobby_s3net_segmentation.gif
	│ ├── 2.lobby_semantic_mapping.gif
	│ └── 3.lobby_semantic_navigation.gif
	├── model/
	│ └── s3_net_model.pth # Pretrained model weights
	├── output/ # Inference output directory
	├── scripts/
	│ ├── model.py # S³-Net model architecture
	│ ├── train.py # Training script
	│ ├── decode_demo.py # Inference/demo script
	│ └── lovasz_losses.py # Lovasz-Softmax loss function
	├── run_train.sh # Training driver script
	├── run_eval_demo.sh # Inference driver script
	├── LICENSE # MIT License
	└── README.md # This file
	```


	## TensorBoard Monitoring

	Training logs are saved to `./runs/`. View training progress:

	```bash
	tensorboard --logdir=runs
	```

	Monitored metrics:
	- Training/Validation loss
	- Cross-Entropy loss
	- Lovasz-Softmax loss

	## Pre-trained Model

	A pre-trained model is included at `model/s3_net_model.pth`. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor.

	To use the pre-trained model:
	```bash
	sh run_eval_demo.sh ~/semantic2d_data/
	```


	## Citation

	```bibtex
	@article{xie2026semantic2d,
	title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
	author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
	journal={arXiv preprint arXiv:2409.09899},
	year={2026}
	}
	```