s3_net

File size: 8,157 Bytes

d9c5371

# Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

 S³-Net implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899).
Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY).
The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.

## Related Resources

- **Dataset Download:** https://doi.org/10.5281/zenodo.18350696
- **SALSA (Dataset and Labeling Framework):** https://github.com/TempleRAIL/semantic2d
- **S³-Net (Stochastic Semantic Segmentation):** https://github.com/TempleRAIL/s3_net
- **Semantic CNN Navigation:** https://github.com/TempleRAIL/semantic_cnn_nav

## S³-Net: Stochastic Semantic Segmentation Network

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

S³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point.

## Demo Results

**S³-Net Segmentation**
![S³-Net Segmentation](./demo/1.lobby_s3net_segmentation.gif)

**Semantic Mapping**
![Semantic Mapping](./demo/2.lobby_semantic_mapping.gif)

**Semantic Navigation**
![Semantic Navigation](./demo/3.lobby_semantic_navigation.gif)

## Model Architecture

S³-Net uses an encoder-decoder architecture with stochastic latent representations:

```
Input (3 channels: scan, intensity, angle of incidence)
    │
    ▼
┌─────────────────────────────────────┐
│  Encoder (Conv1D + Residual Blocks) │
│  - Conv1D (3 → 32) stride=2         │
│  - Conv1D (32 → 64) stride=2        │
│  - Residual Stack (2 layers)        │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  VAE Reparameterization             │
│  - μ (mean) and σ (std) estimation  │
│  - Latent sampling z ~ N(μ, σ²)     │
│  - Monte Carlo KL divergence        │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  Decoder (Residual + TransposeConv) │
│  - Residual Stack (2 layers)        │
│  - TransposeConv1D (64 → 32)        │
│  - TransposeConv1D (32 → 10)        │
│  - Softmax (10 semantic classes)    │
└─────────────────────────────────────┘
    │
    ▼
Output (10 channels: semantic probabilities)
```

**Key Features:**
- **3 Input Channels:** Range scan, intensity, angle of incidence
- **10 Output Classes:** Background + 9 semantic classes
- **Stochastic Inference:** Multiple forward passes enable uncertainty estimation via majority voting
- **Loss Function:** Cross-Entropy + Lovasz-Softmax + β-VAE KL divergence

## Semantic Classes

| ID | Class      | Description                    |
|----|------------|--------------------------------|
| 0  | Other      | Background/unknown             |
| 1  | Chair      | Office and lounge chairs       |
| 2  | Door       | Doors (open/closed)            |
| 3  | Elevator   | Elevator doors                 |
| 4  | Person     | Dynamic pedestrians            |
| 5  | Pillar     | Structural pillars/columns     |
| 6  | Sofa       | Sofas and couches              |
| 7  | Table      | Tables of all types            |
| 8  | Trash bin  | Waste receptacles              |
| 9  | Wall       | Walls and flat surfaces        |

## Requirements

- Python 3.7+
- PyTorch 1.7.1+
- TensorBoard
- NumPy
- Matplotlib
- tqdm

Install dependencies:
```bash
pip install torch torchvision tensorboardX numpy matplotlib tqdm
```

## Dataset Structure

S³-Net expects the Semantic2D dataset organized as follows:

```
~/semantic2d_data/
├── dataset.txt                # List of dataset folders
├── 2024-04-11-15-24-29/       # Dataset folder 1
│   ├── train.txt              # Training sample list
│   ├── dev.txt                # Validation sample list
│   ├── scans_lidar/           # Range scans (.npy)
│   ├── intensities_lidar/     # Intensity data (.npy)
│   └── semantic_label/        # Ground truth labels (.npy)
├── 2024-04-04-12-16-41/       # Dataset folder 2
│   └── ...
└── ...
```

**dataset.txt format:**
```
2024-04-11-15-24-29
2024-04-04-12-16-41
```

## Usage

### Training

Train S³-Net on your dataset:

```bash
sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
```

**Arguments:**
- `$1` - Training data directory (contains `dataset.txt` and subfolders)
- `$2` - Validation data directory

**Training Configuration** (in `scripts/train.py`):

| Parameter | Default | Description |
|-----------|---------|-------------|
| `NUM_EPOCHS` | 20000 | Total training epochs |
| `BATCH_SIZE` | 1024 | Samples per batch |
| `LEARNING_RATE` | 0.001 | Initial learning rate |
| `BETA` | 0.01 | β-VAE weight for KL divergence |

**Learning Rate Schedule:**
- Epochs 0-50000: `1e-4`
- Epochs 50000-480000: `2e-5`
- Epochs 480000+: Exponential decay

The model saves checkpoints every 2000 epochs to `./model/`.

### Inference Demo

Run semantic segmentation on test data:

```bash
sh run_eval_demo.sh ~/semantic2d_data/
```

**Arguments:**
- `$1` - Test data directory (reads `dev.txt` for sample list)

**Output:**
- `./output/semantic_ground_truth_*.png` - Ground truth visualizations
- `./output/semantic_s3net_*.png` - S³-Net predictions

**Example Output:**

| Ground Truth | S³-Net Prediction |
|:------------:|:-----------------:|
| ![Ground Truth](./output/semantic_ground_truth_7000.png) | ![S³-Net Prediction](./output/semantic_s3net_7000.png) |

### Stochastic Inference

S³-Net performs **32 stochastic forward passes** per sample and uses **majority voting** to determine the final prediction. This provides:
- More robust predictions
- Implicit uncertainty estimation
- Reduced noise in segmentation boundaries

## File Structure

```
s3_net/
├── demo/                           # Demo GIFs
│   ├── 1.lobby_s3net_segmentation.gif
│   ├── 2.lobby_semantic_mapping.gif
│   └── 3.lobby_semantic_navigation.gif
├── model/
│   └── s3_net_model.pth            # Pretrained model weights
├── output/                         # Inference output directory
├── scripts/
│   ├── model.py                    # S³-Net model architecture
│   ├── train.py                    # Training script
│   ├── decode_demo.py              # Inference/demo script
│   └── lovasz_losses.py            # Lovasz-Softmax loss function
├── run_train.sh                    # Training driver script
├── run_eval_demo.sh                # Inference driver script
├── LICENSE                         # MIT License
└── README.md                       # This file
```


## TensorBoard Monitoring

Training logs are saved to `./runs/`. View training progress:

```bash
tensorboard --logdir=runs
```

Monitored metrics:
- Training/Validation loss
- Cross-Entropy loss
- Lovasz-Softmax loss

## Pre-trained Model

A pre-trained model is included at `model/s3_net_model.pth`. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor.

To use the pre-trained model:
```bash
sh run_eval_demo.sh ~/semantic2d_data/
```


## Citation

```bibtex
@article{xie2026semantic2d,
  title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
  author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
  journal={arXiv preprint arXiv:2409.09899},
  year={2026}
}
```