YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

SΒ³-Net implementation code for our paper "Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone". Video demos can be found at multimedia demonstrations. The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.

Related Resources

SΒ³-Net: Stochastic Semantic Segmentation Network

License: MIT

SΒ³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point.

Demo Results

SΒ³-Net Segmentation SΒ³-Net Segmentation

Semantic Mapping Semantic Mapping

Semantic Navigation Semantic Navigation

Model Architecture

SΒ³-Net uses an encoder-decoder architecture with stochastic latent representations:

Input (3 channels: scan, intensity, angle of incidence)
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Encoder (Conv1D + Residual Blocks) β”‚
β”‚  - Conv1D (3 β†’ 32) stride=2         β”‚
β”‚  - Conv1D (32 β†’ 64) stride=2        β”‚
β”‚  - Residual Stack (2 layers)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  VAE Reparameterization             β”‚
β”‚  - ΞΌ (mean) and Οƒ (std) estimation  β”‚
β”‚  - Latent sampling z ~ N(ΞΌ, σ²)     β”‚
β”‚  - Monte Carlo KL divergence        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Decoder (Residual + TransposeConv) β”‚
β”‚  - Residual Stack (2 layers)        β”‚
β”‚  - TransposeConv1D (64 β†’ 32)        β”‚
β”‚  - TransposeConv1D (32 β†’ 10)        β”‚
β”‚  - Softmax (10 semantic classes)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
Output (10 channels: semantic probabilities)

Key Features:

  • 3 Input Channels: Range scan, intensity, angle of incidence
  • 10 Output Classes: Background + 9 semantic classes
  • Stochastic Inference: Multiple forward passes enable uncertainty estimation via majority voting
  • Loss Function: Cross-Entropy + Lovasz-Softmax + Ξ²-VAE KL divergence

Semantic Classes

ID Class Description
0 Other Background/unknown
1 Chair Office and lounge chairs
2 Door Doors (open/closed)
3 Elevator Elevator doors
4 Person Dynamic pedestrians
5 Pillar Structural pillars/columns
6 Sofa Sofas and couches
7 Table Tables of all types
8 Trash bin Waste receptacles
9 Wall Walls and flat surfaces

Requirements

  • Python 3.7+
  • PyTorch 1.7.1+
  • TensorBoard
  • NumPy
  • Matplotlib
  • tqdm

Install dependencies:

pip install torch torchvision tensorboardX numpy matplotlib tqdm

Dataset Structure

SΒ³-Net expects the Semantic2D dataset organized as follows:

~/semantic2d_data/
β”œβ”€β”€ dataset.txt                # List of dataset folders
β”œβ”€β”€ 2024-04-11-15-24-29/       # Dataset folder 1
β”‚   β”œβ”€β”€ train.txt              # Training sample list
β”‚   β”œβ”€β”€ dev.txt                # Validation sample list
β”‚   β”œβ”€β”€ scans_lidar/           # Range scans (.npy)
β”‚   β”œβ”€β”€ intensities_lidar/     # Intensity data (.npy)
β”‚   └── semantic_label/        # Ground truth labels (.npy)
β”œβ”€β”€ 2024-04-04-12-16-41/       # Dataset folder 2
β”‚   └── ...
└── ...

dataset.txt format:

2024-04-11-15-24-29
2024-04-04-12-16-41

Usage

Training

Train SΒ³-Net on your dataset:

sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/

Arguments:

  • $1 - Training data directory (contains dataset.txt and subfolders)
  • $2 - Validation data directory

Training Configuration (in scripts/train.py):

Parameter Default Description
NUM_EPOCHS 20000 Total training epochs
BATCH_SIZE 1024 Samples per batch
LEARNING_RATE 0.001 Initial learning rate
BETA 0.01 Ξ²-VAE weight for KL divergence

Learning Rate Schedule:

  • Epochs 0-50000: 1e-4
  • Epochs 50000-480000: 2e-5
  • Epochs 480000+: Exponential decay

The model saves checkpoints every 2000 epochs to ./model/.

Inference Demo

Run semantic segmentation on test data:

sh run_eval_demo.sh ~/semantic2d_data/

Arguments:

  • $1 - Test data directory (reads dev.txt for sample list)

Output:

  • ./output/semantic_ground_truth_*.png - Ground truth visualizations
  • ./output/semantic_s3net_*.png - SΒ³-Net predictions

Example Output:

Ground Truth SΒ³-Net Prediction
Ground Truth SΒ³-Net Prediction

Stochastic Inference

SΒ³-Net performs 32 stochastic forward passes per sample and uses majority voting to determine the final prediction. This provides:

  • More robust predictions
  • Implicit uncertainty estimation
  • Reduced noise in segmentation boundaries

File Structure

s3_net/
β”œβ”€β”€ demo/                           # Demo GIFs
β”‚   β”œβ”€β”€ 1.lobby_s3net_segmentation.gif
β”‚   β”œβ”€β”€ 2.lobby_semantic_mapping.gif
β”‚   └── 3.lobby_semantic_navigation.gif
β”œβ”€β”€ model/
β”‚   └── s3_net_model.pth            # Pretrained model weights
β”œβ”€β”€ output/                         # Inference output directory
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ model.py                    # SΒ³-Net model architecture
β”‚   β”œβ”€β”€ train.py                    # Training script
β”‚   β”œβ”€β”€ decode_demo.py              # Inference/demo script
β”‚   └── lovasz_losses.py            # Lovasz-Softmax loss function
β”œβ”€β”€ run_train.sh                    # Training driver script
β”œβ”€β”€ run_eval_demo.sh                # Inference driver script
β”œβ”€β”€ LICENSE                         # MIT License
└── README.md                       # This file

TensorBoard Monitoring

Training logs are saved to ./runs/. View training progress:

tensorboard --logdir=runs

Monitored metrics:

  • Training/Validation loss
  • Cross-Entropy loss
  • Lovasz-Softmax loss

Pre-trained Model

A pre-trained model is included at model/s3_net_model.pth. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor.

To use the pre-trained model:

sh run_eval_demo.sh ~/semantic2d_data/

Citation

@article{xie2026semantic2d,
  title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
  author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
  journal={arXiv preprint arXiv:2409.09899},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for TempleRAIL/s3_net