| # Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone | |
| Semantic CNN Navigation implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899). | |
| Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY). | |
| The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696. | |
| ## Related Resources | |
| - **Dataset Download:** https://doi.org/10.5281/zenodo.18350696 | |
| - **SALSA (Dataset and Labeling Framework):** https://github.com/TempleRAIL/semantic2d | |
| - **SΒ³-Net (Stochastic Semantic Segmentation):** https://github.com/TempleRAIL/s3_net | |
| - **Semantic CNN Navigation:** https://github.com/TempleRAIL/semantic_cnn_nav | |
| ## Overview | |
| [](https://opensource.org/licenses/MIT) | |
| This repository contains two main components: | |
| 1. **Training**: CNN-based control policy training using the Semantic2D dataset | |
| 2. **ROS Deployment**: Real-time semantic-aware navigation for mobile robots | |
| The Semantic CNN Navigation system combines: | |
| - **SΒ³-Net**: Real-time semantic segmentation of 2D LiDAR scans | |
| - **SemanticCNN**: ResNet-based control policy that uses semantic information for navigation | |
| ## Demo Results | |
| **Engineering Lobby Semantic Navigation** | |
|  | |
| **Engineering 4th Floor Semantic Navigation** | |
|  | |
| **CYC 4th Floor Semantic Navigation** | |
|  | |
| ## System Architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Semantic CNN Navigation β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β | |
| β β LiDAR Scan βββββΆβ SΒ³-Net βββββΆβ Semantic Labels (10) β β | |
| β β + Intensityβ β Segmentationβ β per LiDAR point β β | |
| β βββββββββββββββ βββββββββββββββ βββββββββββββ¬ββββββββββββββ β | |
| β β β | |
| β βββββββββββββββ βΌ β | |
| β β Sub-Goal βββββββββββββββββββββββββΆβββββββββββββββββββββββββββ β | |
| β β (x, y) β β SemanticCNN β β | |
| β βββββββββββββββ β (ResNet + Bottleneck) β β | |
| β β β β | |
| β βββββββββββββββ β Input: 80x80 scan map β β | |
| β β Scan Map βββββββββββββββββββββββββΆβ + semantic map β β | |
| β β (history) β β + sub-goal β β | |
| β βββββββββββββββ βββββββββββββ¬ββββββββββββββ β | |
| β β β | |
| β βΌ β | |
| β βββββββββββββββββββββββββββ β | |
| β β Velocity Command β β | |
| β β (linear_x, angular_z) β β | |
| β βββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Requirements | |
| ### Training | |
| - Python 3.7+ | |
| - PyTorch 1.7.1+ | |
| - TensorBoard | |
| - NumPy | |
| - tqdm | |
| ### ROS Deployment | |
| - Ubuntu 20.04 | |
| - ROS Noetic | |
| - Python 3.8.5 | |
| - PyTorch 1.7.1 | |
| Install training dependencies: | |
| ```bash | |
| pip install torch torchvision tensorboardX numpy tqdm | |
| ``` | |
| --- | |
| # Part 1: Training | |
| ## Dataset Structure | |
| The training expects the Semantic2D dataset organized as follows: | |
| ``` | |
| ~/semantic2d_data/ | |
| βββ dataset.txt # List of dataset folders | |
| βββ 2024-04-11-15-24-29/ # Dataset folder 1 | |
| β βββ train.txt # Training sample list | |
| β βββ dev.txt # Validation sample list | |
| β βββ scans_lidar/ # Range scans (.npy) | |
| β βββ semantic_label/ # Semantic labels (.npy) | |
| β βββ sub_goals_local/ # Local sub-goals (.npy) | |
| β βββ velocities/ # Ground truth velocities (.npy) | |
| βββ ... | |
| ``` | |
| ## Model Architecture | |
| **SemanticCNN** uses a ResNet-style architecture with Bottleneck blocks: | |
| | Component | Details | | |
| |-----------|---------| | |
| | **Input** | 2 channels: scan map (80x80) + semantic map (80x80) | | |
| | **Backbone** | ResNet with Bottleneck blocks [2, 1, 1] | | |
| | **Goal Input** | 2D sub-goal (x, y) concatenated after pooling | | |
| | **Output** | 2D velocity (linear_x, angular_z) | | |
| | **Loss** | MSE Loss | | |
| **Key Parameters:** | |
| - Sequence length: 10 frames | |
| - Image size: 80x80 | |
| - LiDAR points: 1081 β downsampled to 720 (removing Β±180 points) | |
| ## Training | |
| Train the Semantic CNN model: | |
| ```bash | |
| cd training | |
| sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/ | |
| ``` | |
| **Arguments:** | |
| - `$1` - Training data directory | |
| - `$2` - Validation data directory | |
| **Training Configuration** (in `scripts/train.py`): | |
| | Parameter | Default | Description | | |
| |-----------|---------|-------------| | |
| | `NUM_EPOCHS` | 4000 | Total training epochs | | |
| | `BATCH_SIZE` | 64 | Samples per batch | | |
| | `LEARNING_RATE` | 0.001 | Initial learning rate | | |
| **Learning Rate Schedule:** | |
| - Epochs 0-40: `1e-3` | |
| - Epochs 40-2000: `2e-4` | |
| - Epochs 2000-21000: `2e-5` | |
| - Epochs 21000+: `1e-5` | |
| Model checkpoints saved every 50 epochs to `./model/`. | |
| ## Evaluation | |
| Evaluate the trained model: | |
| ```bash | |
| cd training | |
| sh run_eval.sh ~/semantic2d_data/ | |
| ``` | |
| **Output:** Results saved to `./output/` | |
| ## Training File Structure | |
| ``` | |
| training/ | |
| βββ model/ | |
| β βββ semantic_cnn_model.pth # Pretrained model weights | |
| βββ scripts/ | |
| β βββ model.py # SemanticCNN architecture + NavDataset | |
| β βββ train.py # Training script | |
| β βββ decode_demo.py # Evaluation/demo script | |
| βββ run_train.sh # Training driver script | |
| βββ run_eval.sh # Evaluation driver script | |
| ``` | |
| --- | |
| ## TensorBoard Monitoring | |
| Training logs are saved to `./runs/`. View training progress: | |
| ```bash | |
| cd training | |
| tensorboard --logdir=runs | |
| ``` | |
| Monitored metrics: | |
| - Training loss | |
| - Validation loss | |
| --- | |
| # Part 2: ROS Deployment | |
| ## Prerequisites | |
| Install the following ROS packages: | |
| ```bash | |
| # Create catkin workspace | |
| mkdir -p ~/catkin_ws/src | |
| cd ~/catkin_ws/src | |
| # Clone required packages | |
| git clone https://github.com/TempleRAIL/robot_gazebo.git | |
| git clone https://github.com/TempleRAIL/pedsim_ros_with_gazebo.git | |
| # Build | |
| cd ~/catkin_ws | |
| catkin_make | |
| source devel/setup.bash | |
| ``` | |
| ## Installation | |
| 1. Copy the ROS workspace to your catkin workspace: | |
| ```bash | |
| cp -r ros_deployment_ws/src/semantic_cnn_nav ~/catkin_ws/src/ | |
| ``` | |
| 2. Build the workspace: | |
| ```bash | |
| cd ~/catkin_ws | |
| catkin_make | |
| source devel/setup.bash | |
| ``` | |
| ## Usage | |
| ### Launch Gazebo Simulation | |
| ```bash | |
| roslaunch semantic_cnn_nav semantic_cnn_nav_gazebo.launch | |
| ``` | |
| This launch file starts: | |
| - Gazebo simulator with pedestrians (pedsim) | |
| - AMCL localization | |
| - CNN data publisher | |
| - Semantic CNN inference node | |
| - RViz visualization | |
| ### Launch Configuration | |
| Key parameters in `semantic_cnn_nav_gazebo.launch`: | |
| | Parameter | Default | Description | | |
| |-----------|---------|-------------| | |
| | `s3_net_model_file` | `model/s3_net_model.pth` | SΒ³-Net model path | | |
| | `semantic_cnn_model_file` | `model/semantic_cnn_model.pth` | SemanticCNN model path | | |
| | `scene_file` | `eng_hall_5.xml` | Pedsim scenario file | | |
| | `world_name` | `eng_hall.world` | Gazebo world file | | |
| | `map_file` | `gazebo_eng_lobby.yaml` | Navigation map | | |
| | `initial_pose_x/y/a` | 1.0, 0.0, 0.13 | Robot initial pose | | |
| ### Send Navigation Goals | |
| Use RViz "2D Nav Goal" tool to send navigation goals to the robot. | |
| ## ROS Nodes | |
| ### cnn_data_pub | |
| Publishes processed LiDAR data for the CNN. | |
| **Subscriptions:** | |
| - `/scan` (sensor_msgs/LaserScan) | |
| **Publications:** | |
| - `/cnn_data` (cnn_msgs/CNN_data) | |
| ### semantic_cnn_nav_inference | |
| Main inference node combining SΒ³-Net and SemanticCNN. | |
| **Subscriptions:** | |
| - `/cnn_data` (cnn_msgs/CNN_data) | |
| **Publications:** | |
| - `/navigation_velocity_smoother/raw_cmd_vel` (geometry_msgs/Twist) | |
| **Parameters:** | |
| - `~s3_net_model_file`: Path to SΒ³-Net model | |
| - `~semantic_cnn_model_file`: Path to SemanticCNN model | |
| ## ROS Deployment File Structure | |
| ``` | |
| ros_deployment_ws/ | |
| βββ src/ | |
| βββ semantic_cnn_nav/ | |
| βββ cnn_msgs/ | |
| β βββ msg/ | |
| β βββ CNN_data.msg # Custom message definition | |
| βββ semantic_cnn/ | |
| βββ launch/ | |
| β βββ cnn_data_pub.launch | |
| β βββ semantic_cnn_inference.launch | |
| β βββ semantic_cnn_nav_gazebo.launch | |
| βββ src/ | |
| βββ model/ | |
| β βββ s3_net_model.pth # SΒ³-Net pretrained weights | |
| β βββ semantic_cnn_model.pth # SemanticCNN weights | |
| βββ cnn_data_pub.py # Data preprocessing node | |
| βββ cnn_model.py # Model definitions | |
| βββ pure_pursuit.py # Pure pursuit controller | |
| βββ goal_visualize.py # Goal visualization | |
| βββ semantic_cnn_nav_inference.py # Main inference node | |
| ``` | |
| --- | |
| ## Pre-trained Models | |
| Pre-trained models are included: | |
| | Model | Location | Description | | |
| |-------|----------|-------------| | |
| | `s3_net_model.pth` | `ros_deployment_ws/.../model/` | SΒ³-Net semantic segmentation | | |
| | `semantic_cnn_model.pth` | `training/model/` | SemanticCNN navigation policy | | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @article{xie2026semantic2d, | |
| title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone}, | |
| author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip}, | |
| journal={arXiv preprint arXiv:2409.09899}, | |
| year={2026} | |
| } | |
| @inproceedings{xie2021towards, | |
| title={Towards Safe Navigation Through Crowded Dynamic Environments}, | |
| author={Xie, Zhanteng and Xin, Pujie and Dames, Philip}, | |
| booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, | |
| year={2021}, | |
| doi={10.1109/IROS51168.2021.9636102} | |
| } | |
| ``` | |