YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

S³-Net implementation code for our paper "Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone". Video demos can be found at multimedia demonstrations. The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.

Related Resources

Dataset Download: https://doi.org/10.5281/zenodo.18350696
SALSA (Dataset and Labeling Framework): https://github.com/TempleRAIL/semantic2d
S³-Net (Stochastic Semantic Segmentation): https://github.com/TempleRAIL/s3_net
Semantic CNN Navigation: https://github.com/TempleRAIL/semantic_cnn_nav

S³-Net: Stochastic Semantic Segmentation Network

S³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point.

Demo Results

S³-Net Segmentation

Semantic Mapping

Semantic Navigation

Model Architecture

S³-Net uses an encoder-decoder architecture with stochastic latent representations:

Input (3 channels: scan, intensity, angle of incidence)
    │
    ▼
┌─────────────────────────────────────┐
│  Encoder (Conv1D + Residual Blocks) │
│  - Conv1D (3 → 32) stride=2         │
│  - Conv1D (32 → 64) stride=2        │
│  - Residual Stack (2 layers)        │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  VAE Reparameterization             │
│  - μ (mean) and σ (std) estimation  │
│  - Latent sampling z ~ N(μ, σ²)     │
│  - Monte Carlo KL divergence        │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  Decoder (Residual + TransposeConv) │
│  - Residual Stack (2 layers)        │
│  - TransposeConv1D (64 → 32)        │
│  - TransposeConv1D (32 → 10)        │
│  - Softmax (10 semantic classes)    │
└─────────────────────────────────────┘
    │
    ▼
Output (10 channels: semantic probabilities)

Key Features:

3 Input Channels: Range scan, intensity, angle of incidence
10 Output Classes: Background + 9 semantic classes
Stochastic Inference: Multiple forward passes enable uncertainty estimation via majority voting
Loss Function: Cross-Entropy + Lovasz-Softmax + β-VAE KL divergence

Semantic Classes

ID	Class	Description
0	Other	Background/unknown
1	Chair	Office and lounge chairs
2	Door	Doors (open/closed)
3	Elevator	Elevator doors
4	Person	Dynamic pedestrians
5	Pillar	Structural pillars/columns
6	Sofa	Sofas and couches
7	Table	Tables of all types
8	Trash bin	Waste receptacles
9	Wall	Walls and flat surfaces

Requirements

Python 3.7+
PyTorch 1.7.1+
TensorBoard
NumPy
Matplotlib
tqdm

Install dependencies:

pip install torch torchvision tensorboardX numpy matplotlib tqdm

Dataset Structure

S³-Net expects the Semantic2D dataset organized as follows:

~/semantic2d_data/
├── dataset.txt                # List of dataset folders
├── 2024-04-11-15-24-29/       # Dataset folder 1
│   ├── train.txt              # Training sample list
│   ├── dev.txt                # Validation sample list
│   ├── scans_lidar/           # Range scans (.npy)
│   ├── intensities_lidar/     # Intensity data (.npy)
│   └── semantic_label/        # Ground truth labels (.npy)
├── 2024-04-04-12-16-41/       # Dataset folder 2
│   └── ...
└── ...

dataset.txt format:

2024-04-11-15-24-29
2024-04-04-12-16-41

Usage

Training

Train S³-Net on your dataset:

sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/

Arguments:

$1 - Training data directory (contains dataset.txt and subfolders)
$2 - Validation data directory

Training Configuration (in scripts/train.py):

Parameter	Default	Description
`NUM_EPOCHS`	20000	Total training epochs
`BATCH_SIZE`	1024	Samples per batch
`LEARNING_RATE`	0.001	Initial learning rate
`BETA`	0.01	β-VAE weight for KL divergence

Learning Rate Schedule:

Epochs 0-50000: 1e-4
Epochs 50000-480000: 2e-5
Epochs 480000+: Exponential decay

The model saves checkpoints every 2000 epochs to ./model/.

Inference Demo

Run semantic segmentation on test data:

sh run_eval_demo.sh ~/semantic2d_data/

Arguments:

$1 - Test data directory (reads dev.txt for sample list)

Output:

./output/semantic_ground_truth_*.png - Ground truth visualizations
./output/semantic_s3net_*.png - S³-Net predictions

Example Output:

Ground Truth	S³-Net Prediction

Stochastic Inference

S³-Net performs 32 stochastic forward passes per sample and uses majority voting to determine the final prediction. This provides:

More robust predictions
Implicit uncertainty estimation
Reduced noise in segmentation boundaries

File Structure

s3_net/
├── demo/                           # Demo GIFs
│   ├── 1.lobby_s3net_segmentation.gif
│   ├── 2.lobby_semantic_mapping.gif
│   └── 3.lobby_semantic_navigation.gif
├── model/
│   └── s3_net_model.pth            # Pretrained model weights
├── output/                         # Inference output directory
├── scripts/
│   ├── model.py                    # S³-Net model architecture
│   ├── train.py                    # Training script
│   ├── decode_demo.py              # Inference/demo script
│   └── lovasz_losses.py            # Lovasz-Softmax loss function
├── run_train.sh                    # Training driver script
├── run_eval_demo.sh                # Inference driver script
├── LICENSE                         # MIT License
└── README.md                       # This file

TensorBoard Monitoring

Training logs are saved to ./runs/. View training progress:

tensorboard --logdir=runs

Monitored metrics:

Training/Validation loss
Cross-Entropy loss
Lovasz-Softmax loss

Pre-trained Model

A pre-trained model is included at model/s3_net_model.pth. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor.

To use the pre-trained model:

sh run_eval_demo.sh ~/semantic2d_data/

Citation

@article{xie2026semantic2d,
  title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
  author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
  journal={arXiv preprint arXiv:2409.09899},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for TempleRAIL/s3_net

Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

Paper • 2409.09899 • Published Sep 15, 2024 • 1