semantic_cnn_nav / README.md
zzuxzt's picture
Upload folder using huggingface_hub
5523920 verified
# Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone
Semantic CNN Navigation implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899).
Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY).
The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.
## Related Resources
- **Dataset Download:** https://doi.org/10.5281/zenodo.18350696
- **SALSA (Dataset and Labeling Framework):** https://github.com/TempleRAIL/semantic2d
- **SΒ³-Net (Stochastic Semantic Segmentation):** https://github.com/TempleRAIL/s3_net
- **Semantic CNN Navigation:** https://github.com/TempleRAIL/semantic_cnn_nav
## Overview
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
This repository contains two main components:
1. **Training**: CNN-based control policy training using the Semantic2D dataset
2. **ROS Deployment**: Real-time semantic-aware navigation for mobile robots
The Semantic CNN Navigation system combines:
- **SΒ³-Net**: Real-time semantic segmentation of 2D LiDAR scans
- **SemanticCNN**: ResNet-based control policy that uses semantic information for navigation
## Demo Results
**Engineering Lobby Semantic Navigation**
![Engineering Lobby Semantic Navigation](./demo/1.lobby_semantic_navigation.gif)
**Engineering 4th Floor Semantic Navigation**
![Engineering 4th Floor Semantic Navigation](./demo/1.eng4th_semantic_navigation.gif)
**CYC 4th Floor Semantic Navigation**
![CYC 4th Floor Semantic Navigation](./demo/3.cyc4th_semantic_navigation.gif)
## System Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Semantic CNN Navigation β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ LiDAR Scan │───▢│ SΒ³-Net │───▢│ Semantic Labels (10) β”‚ β”‚
β”‚ β”‚ + Intensityβ”‚ β”‚ Segmentationβ”‚ β”‚ per LiDAR point β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β–Ό β”‚
β”‚ β”‚ Sub-Goal β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ (x, y) β”‚ β”‚ SemanticCNN β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ (ResNet + Bottleneck) β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Input: 80x80 scan map β”‚ β”‚
β”‚ β”‚ Scan Map │───────────────────────▢│ + semantic map β”‚ β”‚
β”‚ β”‚ (history) β”‚ β”‚ + sub-goal β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Velocity Command β”‚ β”‚
β”‚ β”‚ (linear_x, angular_z) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Requirements
### Training
- Python 3.7+
- PyTorch 1.7.1+
- TensorBoard
- NumPy
- tqdm
### ROS Deployment
- Ubuntu 20.04
- ROS Noetic
- Python 3.8.5
- PyTorch 1.7.1
Install training dependencies:
```bash
pip install torch torchvision tensorboardX numpy tqdm
```
---
# Part 1: Training
## Dataset Structure
The training expects the Semantic2D dataset organized as follows:
```
~/semantic2d_data/
β”œβ”€β”€ dataset.txt # List of dataset folders
β”œβ”€β”€ 2024-04-11-15-24-29/ # Dataset folder 1
β”‚ β”œβ”€β”€ train.txt # Training sample list
β”‚ β”œβ”€β”€ dev.txt # Validation sample list
β”‚ β”œβ”€β”€ scans_lidar/ # Range scans (.npy)
β”‚ β”œβ”€β”€ semantic_label/ # Semantic labels (.npy)
β”‚ β”œβ”€β”€ sub_goals_local/ # Local sub-goals (.npy)
β”‚ └── velocities/ # Ground truth velocities (.npy)
└── ...
```
## Model Architecture
**SemanticCNN** uses a ResNet-style architecture with Bottleneck blocks:
| Component | Details |
|-----------|---------|
| **Input** | 2 channels: scan map (80x80) + semantic map (80x80) |
| **Backbone** | ResNet with Bottleneck blocks [2, 1, 1] |
| **Goal Input** | 2D sub-goal (x, y) concatenated after pooling |
| **Output** | 2D velocity (linear_x, angular_z) |
| **Loss** | MSE Loss |
**Key Parameters:**
- Sequence length: 10 frames
- Image size: 80x80
- LiDAR points: 1081 β†’ downsampled to 720 (removing Β±180 points)
## Training
Train the Semantic CNN model:
```bash
cd training
sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
```
**Arguments:**
- `$1` - Training data directory
- `$2` - Validation data directory
**Training Configuration** (in `scripts/train.py`):
| Parameter | Default | Description |
|-----------|---------|-------------|
| `NUM_EPOCHS` | 4000 | Total training epochs |
| `BATCH_SIZE` | 64 | Samples per batch |
| `LEARNING_RATE` | 0.001 | Initial learning rate |
**Learning Rate Schedule:**
- Epochs 0-40: `1e-3`
- Epochs 40-2000: `2e-4`
- Epochs 2000-21000: `2e-5`
- Epochs 21000+: `1e-5`
Model checkpoints saved every 50 epochs to `./model/`.
## Evaluation
Evaluate the trained model:
```bash
cd training
sh run_eval.sh ~/semantic2d_data/
```
**Output:** Results saved to `./output/`
## Training File Structure
```
training/
β”œβ”€β”€ model/
β”‚ └── semantic_cnn_model.pth # Pretrained model weights
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ model.py # SemanticCNN architecture + NavDataset
β”‚ β”œβ”€β”€ train.py # Training script
β”‚ └── decode_demo.py # Evaluation/demo script
β”œβ”€β”€ run_train.sh # Training driver script
└── run_eval.sh # Evaluation driver script
```
---
## TensorBoard Monitoring
Training logs are saved to `./runs/`. View training progress:
```bash
cd training
tensorboard --logdir=runs
```
Monitored metrics:
- Training loss
- Validation loss
---
# Part 2: ROS Deployment
## Prerequisites
Install the following ROS packages:
```bash
# Create catkin workspace
mkdir -p ~/catkin_ws/src
cd ~/catkin_ws/src
# Clone required packages
git clone https://github.com/TempleRAIL/robot_gazebo.git
git clone https://github.com/TempleRAIL/pedsim_ros_with_gazebo.git
# Build
cd ~/catkin_ws
catkin_make
source devel/setup.bash
```
## Installation
1. Copy the ROS workspace to your catkin workspace:
```bash
cp -r ros_deployment_ws/src/semantic_cnn_nav ~/catkin_ws/src/
```
2. Build the workspace:
```bash
cd ~/catkin_ws
catkin_make
source devel/setup.bash
```
## Usage
### Launch Gazebo Simulation
```bash
roslaunch semantic_cnn_nav semantic_cnn_nav_gazebo.launch
```
This launch file starts:
- Gazebo simulator with pedestrians (pedsim)
- AMCL localization
- CNN data publisher
- Semantic CNN inference node
- RViz visualization
### Launch Configuration
Key parameters in `semantic_cnn_nav_gazebo.launch`:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `s3_net_model_file` | `model/s3_net_model.pth` | SΒ³-Net model path |
| `semantic_cnn_model_file` | `model/semantic_cnn_model.pth` | SemanticCNN model path |
| `scene_file` | `eng_hall_5.xml` | Pedsim scenario file |
| `world_name` | `eng_hall.world` | Gazebo world file |
| `map_file` | `gazebo_eng_lobby.yaml` | Navigation map |
| `initial_pose_x/y/a` | 1.0, 0.0, 0.13 | Robot initial pose |
### Send Navigation Goals
Use RViz "2D Nav Goal" tool to send navigation goals to the robot.
## ROS Nodes
### cnn_data_pub
Publishes processed LiDAR data for the CNN.
**Subscriptions:**
- `/scan` (sensor_msgs/LaserScan)
**Publications:**
- `/cnn_data` (cnn_msgs/CNN_data)
### semantic_cnn_nav_inference
Main inference node combining SΒ³-Net and SemanticCNN.
**Subscriptions:**
- `/cnn_data` (cnn_msgs/CNN_data)
**Publications:**
- `/navigation_velocity_smoother/raw_cmd_vel` (geometry_msgs/Twist)
**Parameters:**
- `~s3_net_model_file`: Path to SΒ³-Net model
- `~semantic_cnn_model_file`: Path to SemanticCNN model
## ROS Deployment File Structure
```
ros_deployment_ws/
└── src/
└── semantic_cnn_nav/
β”œβ”€β”€ cnn_msgs/
β”‚ └── msg/
β”‚ └── CNN_data.msg # Custom message definition
└── semantic_cnn/
β”œβ”€β”€ launch/
β”‚ β”œβ”€β”€ cnn_data_pub.launch
β”‚ β”œβ”€β”€ semantic_cnn_inference.launch
β”‚ └── semantic_cnn_nav_gazebo.launch
└── src/
β”œβ”€β”€ model/
β”‚ β”œβ”€β”€ s3_net_model.pth # SΒ³-Net pretrained weights
β”‚ └── semantic_cnn_model.pth # SemanticCNN weights
β”œβ”€β”€ cnn_data_pub.py # Data preprocessing node
β”œβ”€β”€ cnn_model.py # Model definitions
β”œβ”€β”€ pure_pursuit.py # Pure pursuit controller
β”œβ”€β”€ goal_visualize.py # Goal visualization
└── semantic_cnn_nav_inference.py # Main inference node
```
---
## Pre-trained Models
Pre-trained models are included:
| Model | Location | Description |
|-------|----------|-------------|
| `s3_net_model.pth` | `ros_deployment_ws/.../model/` | SΒ³-Net semantic segmentation |
| `semantic_cnn_model.pth` | `training/model/` | SemanticCNN navigation policy |
---
## Citation
```bibtex
@article{xie2026semantic2d,
title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
journal={arXiv preprint arXiv:2409.09899},
year={2026}
}
@inproceedings{xie2021towards,
title={Towards Safe Navigation Through Crowded Dynamic Environments},
author={Xie, Zhanteng and Xin, Pujie and Dames, Philip},
booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021},
doi={10.1109/IROS51168.2021.9636102}
}
```