Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone
Semantic CNN Navigation implementation code for our paper "Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone". Video demos can be found at multimedia demonstrations. The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.
Related Resources
- Dataset Download: https://doi.org/10.5281/zenodo.18350696
- SALSA (Dataset and Labeling Framework): https://github.com/TempleRAIL/semantic2d
- SΒ³-Net (Stochastic Semantic Segmentation): https://github.com/TempleRAIL/s3_net
- Semantic CNN Navigation: https://github.com/TempleRAIL/semantic_cnn_nav
Overview
This repository contains two main components:
- Training: CNN-based control policy training using the Semantic2D dataset
- ROS Deployment: Real-time semantic-aware navigation for mobile robots
The Semantic CNN Navigation system combines:
- SΒ³-Net: Real-time semantic segmentation of 2D LiDAR scans
- SemanticCNN: ResNet-based control policy that uses semantic information for navigation
Demo Results
Engineering Lobby Semantic Navigation

Engineering 4th Floor Semantic Navigation

CYC 4th Floor Semantic Navigation

System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Semantic CNN Navigation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β β LiDAR Scan βββββΆβ SΒ³-Net βββββΆβ Semantic Labels (10) β β
β β + Intensityβ β Segmentationβ β per LiDAR point β β
β βββββββββββββββ βββββββββββββββ βββββββββββββ¬ββββββββββββββ β
β β β
β βββββββββββββββ βΌ β
β β Sub-Goal βββββββββββββββββββββββββΆβββββββββββββββββββββββββββ β
β β (x, y) β β SemanticCNN β β
β βββββββββββββββ β (ResNet + Bottleneck) β β
β β β β
β βββββββββββββββ β Input: 80x80 scan map β β
β β Scan Map βββββββββββββββββββββββββΆβ + semantic map β β
β β (history) β β + sub-goal β β
β βββββββββββββββ βββββββββββββ¬ββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββ β
β β Velocity Command β β
β β (linear_x, angular_z) β β
β βββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Requirements
Training
- Python 3.7+
- PyTorch 1.7.1+
- TensorBoard
- NumPy
- tqdm
ROS Deployment
- Ubuntu 20.04
- ROS Noetic
- Python 3.8.5
- PyTorch 1.7.1
Install training dependencies:
pip install torch torchvision tensorboardX numpy tqdm
Part 1: Training
Dataset Structure
The training expects the Semantic2D dataset organized as follows:
~/semantic2d_data/
βββ dataset.txt # List of dataset folders
βββ 2024-04-11-15-24-29/ # Dataset folder 1
β βββ train.txt # Training sample list
β βββ dev.txt # Validation sample list
β βββ scans_lidar/ # Range scans (.npy)
β βββ semantic_label/ # Semantic labels (.npy)
β βββ sub_goals_local/ # Local sub-goals (.npy)
β βββ velocities/ # Ground truth velocities (.npy)
βββ ...
Model Architecture
SemanticCNN uses a ResNet-style architecture with Bottleneck blocks:
| Component | Details |
|---|---|
| Input | 2 channels: scan map (80x80) + semantic map (80x80) |
| Backbone | ResNet with Bottleneck blocks [2, 1, 1] |
| Goal Input | 2D sub-goal (x, y) concatenated after pooling |
| Output | 2D velocity (linear_x, angular_z) |
| Loss | MSE Loss |
Key Parameters:
- Sequence length: 10 frames
- Image size: 80x80
- LiDAR points: 1081 β downsampled to 720 (removing Β±180 points)
Training
Train the Semantic CNN model:
cd training
sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
Arguments:
$1- Training data directory$2- Validation data directory
Training Configuration (in scripts/train.py):
| Parameter | Default | Description |
|---|---|---|
NUM_EPOCHS |
4000 | Total training epochs |
BATCH_SIZE |
64 | Samples per batch |
LEARNING_RATE |
0.001 | Initial learning rate |
Learning Rate Schedule:
- Epochs 0-40:
1e-3 - Epochs 40-2000:
2e-4 - Epochs 2000-21000:
2e-5 - Epochs 21000+:
1e-5
Model checkpoints saved every 50 epochs to ./model/.
Evaluation
Evaluate the trained model:
cd training
sh run_eval.sh ~/semantic2d_data/
Output: Results saved to ./output/
Training File Structure
training/
βββ model/
β βββ semantic_cnn_model.pth # Pretrained model weights
βββ scripts/
β βββ model.py # SemanticCNN architecture + NavDataset
β βββ train.py # Training script
β βββ decode_demo.py # Evaluation/demo script
βββ run_train.sh # Training driver script
βββ run_eval.sh # Evaluation driver script
TensorBoard Monitoring
Training logs are saved to ./runs/. View training progress:
cd training
tensorboard --logdir=runs
Monitored metrics:
- Training loss
- Validation loss
Part 2: ROS Deployment
Prerequisites
Install the following ROS packages:
# Create catkin workspace
mkdir -p ~/catkin_ws/src
cd ~/catkin_ws/src
# Clone required packages
git clone https://github.com/TempleRAIL/robot_gazebo.git
git clone https://github.com/TempleRAIL/pedsim_ros_with_gazebo.git
# Build
cd ~/catkin_ws
catkin_make
source devel/setup.bash
Installation
- Copy the ROS workspace to your catkin workspace:
cp -r ros_deployment_ws/src/semantic_cnn_nav ~/catkin_ws/src/
- Build the workspace:
cd ~/catkin_ws
catkin_make
source devel/setup.bash
Usage
Launch Gazebo Simulation
roslaunch semantic_cnn_nav semantic_cnn_nav_gazebo.launch
This launch file starts:
- Gazebo simulator with pedestrians (pedsim)
- AMCL localization
- CNN data publisher
- Semantic CNN inference node
- RViz visualization
Launch Configuration
Key parameters in semantic_cnn_nav_gazebo.launch:
| Parameter | Default | Description |
|---|---|---|
s3_net_model_file |
model/s3_net_model.pth |
SΒ³-Net model path |
semantic_cnn_model_file |
model/semantic_cnn_model.pth |
SemanticCNN model path |
scene_file |
eng_hall_5.xml |
Pedsim scenario file |
world_name |
eng_hall.world |
Gazebo world file |
map_file |
gazebo_eng_lobby.yaml |
Navigation map |
initial_pose_x/y/a |
1.0, 0.0, 0.13 | Robot initial pose |
Send Navigation Goals
Use RViz "2D Nav Goal" tool to send navigation goals to the robot.
ROS Nodes
cnn_data_pub
Publishes processed LiDAR data for the CNN.
Subscriptions:
/scan(sensor_msgs/LaserScan)
Publications:
/cnn_data(cnn_msgs/CNN_data)
semantic_cnn_nav_inference
Main inference node combining SΒ³-Net and SemanticCNN.
Subscriptions:
/cnn_data(cnn_msgs/CNN_data)
Publications:
/navigation_velocity_smoother/raw_cmd_vel(geometry_msgs/Twist)
Parameters:
~s3_net_model_file: Path to SΒ³-Net model~semantic_cnn_model_file: Path to SemanticCNN model
ROS Deployment File Structure
ros_deployment_ws/
βββ src/
βββ semantic_cnn_nav/
βββ cnn_msgs/
β βββ msg/
β βββ CNN_data.msg # Custom message definition
βββ semantic_cnn/
βββ launch/
β βββ cnn_data_pub.launch
β βββ semantic_cnn_inference.launch
β βββ semantic_cnn_nav_gazebo.launch
βββ src/
βββ model/
β βββ s3_net_model.pth # SΒ³-Net pretrained weights
β βββ semantic_cnn_model.pth # SemanticCNN weights
βββ cnn_data_pub.py # Data preprocessing node
βββ cnn_model.py # Model definitions
βββ pure_pursuit.py # Pure pursuit controller
βββ goal_visualize.py # Goal visualization
βββ semantic_cnn_nav_inference.py # Main inference node
Pre-trained Models
Pre-trained models are included:
| Model | Location | Description |
|---|---|---|
s3_net_model.pth |
ros_deployment_ws/.../model/ |
SΒ³-Net semantic segmentation |
semantic_cnn_model.pth |
training/model/ |
SemanticCNN navigation policy |
Citation
@article{xie2026semantic2d,
title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
journal={arXiv preprint arXiv:2409.09899},
year={2026}
}
@inproceedings{xie2021towards,
title={Towards Safe Navigation Through Crowded Dynamic Environments},
author={Xie, Zhanteng and Xin, Pujie and Dames, Philip},
booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021},
doi={10.1109/IROS51168.2021.9636102}
}