# Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone Semantic CNN Navigation implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899). Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY). The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696. ## Related Resources - **Dataset Download:** https://doi.org/10.5281/zenodo.18350696 - **SALSA (Dataset and Labeling Framework):** https://github.com/TempleRAIL/semantic2d - **S³-Net (Stochastic Semantic Segmentation):** https://github.com/TempleRAIL/s3_net - **Semantic CNN Navigation:** https://github.com/TempleRAIL/semantic_cnn_nav ## Overview [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) This repository contains two main components: 1. **Training**: CNN-based control policy training using the Semantic2D dataset 2. **ROS Deployment**: Real-time semantic-aware navigation for mobile robots The Semantic CNN Navigation system combines: - **S³-Net**: Real-time semantic segmentation of 2D LiDAR scans - **SemanticCNN**: ResNet-based control policy that uses semantic information for navigation ## Demo Results **Engineering Lobby Semantic Navigation** ![Engineering Lobby Semantic Navigation](./demo/1.lobby_semantic_navigation.gif) **Engineering 4th Floor Semantic Navigation** ![Engineering 4th Floor Semantic Navigation](./demo/1.eng4th_semantic_navigation.gif) **CYC 4th Floor Semantic Navigation** ![CYC 4th Floor Semantic Navigation](./demo/3.cyc4th_semantic_navigation.gif) ## System Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ Semantic CNN Navigation │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ LiDAR Scan │───▶│ S³-Net │───▶│ Semantic Labels (10) │ │ │ │ + Intensity│ │ Segmentation│ │ per LiDAR point │ │ │ └─────────────┘ └─────────────┘ └───────────┬─────────────┘ │ │ │ │ │ ┌─────────────┐ ▼ │ │ │ Sub-Goal │───────────────────────▶┌─────────────────────────┐ │ │ │ (x, y) │ │ SemanticCNN │ │ │ └─────────────┘ │ (ResNet + Bottleneck) │ │ │ │ │ │ │ ┌─────────────┐ │ Input: 80x80 scan map │ │ │ │ Scan Map │───────────────────────▶│ + semantic map │ │ │ │ (history) │ │ + sub-goal │ │ │ └─────────────┘ └───────────┬─────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────┐ │ │ │ Velocity Command │ │ │ │ (linear_x, angular_z) │ │ │ └─────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` ## Requirements ### Training - Python 3.7+ - PyTorch 1.7.1+ - TensorBoard - NumPy - tqdm ### ROS Deployment - Ubuntu 20.04 - ROS Noetic - Python 3.8.5 - PyTorch 1.7.1 Install training dependencies: ```bash pip install torch torchvision tensorboardX numpy tqdm ``` --- # Part 1: Training ## Dataset Structure The training expects the Semantic2D dataset organized as follows: ``` ~/semantic2d_data/ ├── dataset.txt # List of dataset folders ├── 2024-04-11-15-24-29/ # Dataset folder 1 │ ├── train.txt # Training sample list │ ├── dev.txt # Validation sample list │ ├── scans_lidar/ # Range scans (.npy) │ ├── semantic_label/ # Semantic labels (.npy) │ ├── sub_goals_local/ # Local sub-goals (.npy) │ └── velocities/ # Ground truth velocities (.npy) └── ... ``` ## Model Architecture **SemanticCNN** uses a ResNet-style architecture with Bottleneck blocks: | Component | Details | |-----------|---------| | **Input** | 2 channels: scan map (80x80) + semantic map (80x80) | | **Backbone** | ResNet with Bottleneck blocks [2, 1, 1] | | **Goal Input** | 2D sub-goal (x, y) concatenated after pooling | | **Output** | 2D velocity (linear_x, angular_z) | | **Loss** | MSE Loss | **Key Parameters:** - Sequence length: 10 frames - Image size: 80x80 - LiDAR points: 1081 → downsampled to 720 (removing ±180 points) ## Training Train the Semantic CNN model: ```bash cd training sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/ ``` **Arguments:** - `$1` - Training data directory - `$2` - Validation data directory **Training Configuration** (in `scripts/train.py`): | Parameter | Default | Description | |-----------|---------|-------------| | `NUM_EPOCHS` | 4000 | Total training epochs | | `BATCH_SIZE` | 64 | Samples per batch | | `LEARNING_RATE` | 0.001 | Initial learning rate | **Learning Rate Schedule:** - Epochs 0-40: `1e-3` - Epochs 40-2000: `2e-4` - Epochs 2000-21000: `2e-5` - Epochs 21000+: `1e-5` Model checkpoints saved every 50 epochs to `./model/`. ## Evaluation Evaluate the trained model: ```bash cd training sh run_eval.sh ~/semantic2d_data/ ``` **Output:** Results saved to `./output/` ## Training File Structure ``` training/ ├── model/ │ └── semantic_cnn_model.pth # Pretrained model weights ├── scripts/ │ ├── model.py # SemanticCNN architecture + NavDataset │ ├── train.py # Training script │ └── decode_demo.py # Evaluation/demo script ├── run_train.sh # Training driver script └── run_eval.sh # Evaluation driver script ``` --- ## TensorBoard Monitoring Training logs are saved to `./runs/`. View training progress: ```bash cd training tensorboard --logdir=runs ``` Monitored metrics: - Training loss - Validation loss --- # Part 2: ROS Deployment ## Prerequisites Install the following ROS packages: ```bash # Create catkin workspace mkdir -p ~/catkin_ws/src cd ~/catkin_ws/src # Clone required packages git clone https://github.com/TempleRAIL/robot_gazebo.git git clone https://github.com/TempleRAIL/pedsim_ros_with_gazebo.git # Build cd ~/catkin_ws catkin_make source devel/setup.bash ``` ## Installation 1. Copy the ROS workspace to your catkin workspace: ```bash cp -r ros_deployment_ws/src/semantic_cnn_nav ~/catkin_ws/src/ ``` 2. Build the workspace: ```bash cd ~/catkin_ws catkin_make source devel/setup.bash ``` ## Usage ### Launch Gazebo Simulation ```bash roslaunch semantic_cnn_nav semantic_cnn_nav_gazebo.launch ``` This launch file starts: - Gazebo simulator with pedestrians (pedsim) - AMCL localization - CNN data publisher - Semantic CNN inference node - RViz visualization ### Launch Configuration Key parameters in `semantic_cnn_nav_gazebo.launch`: | Parameter | Default | Description | |-----------|---------|-------------| | `s3_net_model_file` | `model/s3_net_model.pth` | S³-Net model path | | `semantic_cnn_model_file` | `model/semantic_cnn_model.pth` | SemanticCNN model path | | `scene_file` | `eng_hall_5.xml` | Pedsim scenario file | | `world_name` | `eng_hall.world` | Gazebo world file | | `map_file` | `gazebo_eng_lobby.yaml` | Navigation map | | `initial_pose_x/y/a` | 1.0, 0.0, 0.13 | Robot initial pose | ### Send Navigation Goals Use RViz "2D Nav Goal" tool to send navigation goals to the robot. ## ROS Nodes ### cnn_data_pub Publishes processed LiDAR data for the CNN. **Subscriptions:** - `/scan` (sensor_msgs/LaserScan) **Publications:** - `/cnn_data` (cnn_msgs/CNN_data) ### semantic_cnn_nav_inference Main inference node combining S³-Net and SemanticCNN. **Subscriptions:** - `/cnn_data` (cnn_msgs/CNN_data) **Publications:** - `/navigation_velocity_smoother/raw_cmd_vel` (geometry_msgs/Twist) **Parameters:** - `~s3_net_model_file`: Path to S³-Net model - `~semantic_cnn_model_file`: Path to SemanticCNN model ## ROS Deployment File Structure ``` ros_deployment_ws/ └── src/ └── semantic_cnn_nav/ ├── cnn_msgs/ │ └── msg/ │ └── CNN_data.msg # Custom message definition └── semantic_cnn/ ├── launch/ │ ├── cnn_data_pub.launch │ ├── semantic_cnn_inference.launch │ └── semantic_cnn_nav_gazebo.launch └── src/ ├── model/ │ ├── s3_net_model.pth # S³-Net pretrained weights │ └── semantic_cnn_model.pth # SemanticCNN weights ├── cnn_data_pub.py # Data preprocessing node ├── cnn_model.py # Model definitions ├── pure_pursuit.py # Pure pursuit controller ├── goal_visualize.py # Goal visualization └── semantic_cnn_nav_inference.py # Main inference node ``` --- ## Pre-trained Models Pre-trained models are included: | Model | Location | Description | |-------|----------|-------------| | `s3_net_model.pth` | `ros_deployment_ws/.../model/` | S³-Net semantic segmentation | | `semantic_cnn_model.pth` | `training/model/` | SemanticCNN navigation policy | --- ## Citation ```bibtex @article{xie2026semantic2d, title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone}, author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip}, journal={arXiv preprint arXiv:2409.09899}, year={2026} } @inproceedings{xie2021towards, title={Towards Safe Navigation Through Crowded Dynamic Environments}, author={Xie, Zhanteng and Xin, Pujie and Dames, Philip}, booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, year={2021}, doi={10.1109/IROS51168.2021.9636102} } ```