semantic_cnn_nav / README.md

Upload folder using huggingface_hub

5523920 verified 2 days ago

11.8 kB

	# Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

	Semantic CNN Navigation implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899).
	Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY).
	The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.

	## Related Resources

	- Dataset Download: https://doi.org/10.5281/zenodo.18350696
	- SALSA (Dataset and Labeling Framework): https://github.com/TempleRAIL/semantic2d
	- S³-Net (Stochastic Semantic Segmentation): https://github.com/TempleRAIL/s3_net
	- Semantic CNN Navigation: https://github.com/TempleRAIL/semantic_cnn_nav

	## Overview

	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

	This repository contains two main components:
	1. Training: CNN-based control policy training using the Semantic2D dataset
	2. ROS Deployment: Real-time semantic-aware navigation for mobile robots

	The Semantic CNN Navigation system combines:
	- S³-Net: Real-time semantic segmentation of 2D LiDAR scans
	- SemanticCNN: ResNet-based control policy that uses semantic information for navigation

	## Demo Results

	Engineering Lobby Semantic Navigation
	![Engineering Lobby Semantic Navigation](./demo/1.lobby_semantic_navigation.gif)

	Engineering 4th Floor Semantic Navigation
	![Engineering 4th Floor Semantic Navigation](./demo/1.eng4th_semantic_navigation.gif)

	CYC 4th Floor Semantic Navigation
	![CYC 4th Floor Semantic Navigation](./demo/3.cyc4th_semantic_navigation.gif)

	## System Architecture

	```
	┌─────────────────────────────────────────────────────────────────────┐
	│ Semantic CNN Navigation │
	├─────────────────────────────────────────────────────────────────────┤
	│ │
	│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
	│ │ LiDAR Scan │───▶│ S³-Net │───▶│ Semantic Labels (10) │ │
	│ │ + Intensity│ │ Segmentation│ │ per LiDAR point │ │
	│ └─────────────┘ └─────────────┘ └───────────┬─────────────┘ │
	│ │ │
	│ ┌─────────────┐ ▼ │
	│ │ Sub-Goal │───────────────────────▶┌─────────────────────────┐ │
	│ │ (x, y) │ │ SemanticCNN │ │
	│ └─────────────┘ │ (ResNet + Bottleneck) │ │
	│ │ │ │
	│ ┌─────────────┐ │ Input: 80x80 scan map │ │
	│ │ Scan Map │───────────────────────▶│ + semantic map │ │
	│ │ (history) │ │ + sub-goal │ │
	│ └─────────────┘ └───────────┬─────────────┘ │
	│ │ │
	│ ▼ │
	│ ┌─────────────────────────┐ │
	│ │ Velocity Command │ │
	│ │ (linear_x, angular_z) │ │
	│ └─────────────────────────┘ │
	└─────────────────────────────────────────────────────────────────────┘
	```

	## Requirements

	### Training
	- Python 3.7+
	- PyTorch 1.7.1+
	- TensorBoard
	- NumPy
	- tqdm

	### ROS Deployment
	- Ubuntu 20.04
	- ROS Noetic
	- Python 3.8.5
	- PyTorch 1.7.1

	Install training dependencies:
	```bash
	pip install torch torchvision tensorboardX numpy tqdm
	```

	---

	# Part 1: Training

	## Dataset Structure

	The training expects the Semantic2D dataset organized as follows:

	```
	~/semantic2d_data/
	├── dataset.txt # List of dataset folders
	├── 2024-04-11-15-24-29/ # Dataset folder 1
	│ ├── train.txt # Training sample list
	│ ├── dev.txt # Validation sample list
	│ ├── scans_lidar/ # Range scans (.npy)
	│ ├── semantic_label/ # Semantic labels (.npy)
	│ ├── sub_goals_local/ # Local sub-goals (.npy)
	│ └── velocities/ # Ground truth velocities (.npy)
	└── ...
	```

	## Model Architecture

	SemanticCNN uses a ResNet-style architecture with Bottleneck blocks:

	\| Component \| Details \|
	\|-----------\|---------\|
	\| Input \| 2 channels: scan map (80x80) + semantic map (80x80) \|
	\| Backbone \| ResNet with Bottleneck blocks [2, 1, 1] \|
	\| Goal Input \| 2D sub-goal (x, y) concatenated after pooling \|
	\| Output \| 2D velocity (linear_x, angular_z) \|
	\| Loss \| MSE Loss \|

	Key Parameters:
	- Sequence length: 10 frames
	- Image size: 80x80
	- LiDAR points: 1081 → downsampled to 720 (removing ±180 points)

	## Training

	Train the Semantic CNN model:

	```bash
	cd training
	sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
	```

	Arguments:
	- `$1` - Training data directory
	- `$2` - Validation data directory

	Training Configuration (in `scripts/train.py`):

	\| Parameter \| Default \| Description \|
	\|-----------\|---------\|-------------\|
	\| `NUM_EPOCHS` \| 4000 \| Total training epochs \|
	\| `BATCH_SIZE` \| 64 \| Samples per batch \|
	\| `LEARNING_RATE` \| 0.001 \| Initial learning rate \|

	Learning Rate Schedule:
	- Epochs 0-40: `1e-3`
	- Epochs 40-2000: `2e-4`
	- Epochs 2000-21000: `2e-5`
	- Epochs 21000+: `1e-5`

	Model checkpoints saved every 50 epochs to `./model/`.

	## Evaluation

	Evaluate the trained model:

	```bash
	cd training
	sh run_eval.sh ~/semantic2d_data/
	```

	Output: Results saved to `./output/`

	## Training File Structure

	```
	training/
	├── model/
	│ └── semantic_cnn_model.pth # Pretrained model weights
	├── scripts/
	│ ├── model.py # SemanticCNN architecture + NavDataset
	│ ├── train.py # Training script
	│ └── decode_demo.py # Evaluation/demo script
	├── run_train.sh # Training driver script
	└── run_eval.sh # Evaluation driver script
	```

	---

	## TensorBoard Monitoring

	Training logs are saved to `./runs/`. View training progress:

	```bash
	cd training
	tensorboard --logdir=runs
	```

	Monitored metrics:
	- Training loss
	- Validation loss

	---

	# Part 2: ROS Deployment

	## Prerequisites

	Install the following ROS packages:

	```bash
	# Create catkin workspace
	mkdir -p ~/catkin_ws/src
	cd ~/catkin_ws/src

	# Clone required packages
	git clone https://github.com/TempleRAIL/robot_gazebo.git
	git clone https://github.com/TempleRAIL/pedsim_ros_with_gazebo.git

	# Build
	cd ~/catkin_ws
	catkin_make
	source devel/setup.bash
	```

	## Installation

	1. Copy the ROS workspace to your catkin workspace:
	```bash
	cp -r ros_deployment_ws/src/semantic_cnn_nav ~/catkin_ws/src/
	```

	2. Build the workspace:
	```bash
	cd ~/catkin_ws
	catkin_make
	source devel/setup.bash
	```

	## Usage

	### Launch Gazebo Simulation

	```bash
	roslaunch semantic_cnn_nav semantic_cnn_nav_gazebo.launch
	```

	This launch file starts:
	- Gazebo simulator with pedestrians (pedsim)
	- AMCL localization
	- CNN data publisher
	- Semantic CNN inference node
	- RViz visualization

	### Launch Configuration

	Key parameters in `semantic_cnn_nav_gazebo.launch`:

	\| Parameter \| Default \| Description \|
	\|-----------\|---------\|-------------\|
	\| `s3_net_model_file` \| `model/s3_net_model.pth` \| S³-Net model path \|
	\| `semantic_cnn_model_file` \| `model/semantic_cnn_model.pth` \| SemanticCNN model path \|
	\| `scene_file` \| `eng_hall_5.xml` \| Pedsim scenario file \|
	\| `world_name` \| `eng_hall.world` \| Gazebo world file \|
	\| `map_file` \| `gazebo_eng_lobby.yaml` \| Navigation map \|
	\| `initial_pose_x/y/a` \| 1.0, 0.0, 0.13 \| Robot initial pose \|

	### Send Navigation Goals

	Use RViz "2D Nav Goal" tool to send navigation goals to the robot.

	## ROS Nodes

	### cnn_data_pub
	Publishes processed LiDAR data for the CNN.

	Subscriptions:
	- `/scan` (sensor_msgs/LaserScan)

	Publications:
	- `/cnn_data` (cnn_msgs/CNN_data)

	### semantic_cnn_nav_inference
	Main inference node combining S³-Net and SemanticCNN.

	Subscriptions:
	- `/cnn_data` (cnn_msgs/CNN_data)

	Publications:
	- `/navigation_velocity_smoother/raw_cmd_vel` (geometry_msgs/Twist)

	Parameters:
	- `~s3_net_model_file`: Path to S³-Net model
	- `~semantic_cnn_model_file`: Path to SemanticCNN model

	## ROS Deployment File Structure

	```
	ros_deployment_ws/
	└── src/
	└── semantic_cnn_nav/
	├── cnn_msgs/
	│ └── msg/
	│ └── CNN_data.msg # Custom message definition
	└── semantic_cnn/
	├── launch/
	│ ├── cnn_data_pub.launch
	│ ├── semantic_cnn_inference.launch
	│ └── semantic_cnn_nav_gazebo.launch
	└── src/
	├── model/
	│ ├── s3_net_model.pth # S³-Net pretrained weights
	│ └── semantic_cnn_model.pth # SemanticCNN weights
	├── cnn_data_pub.py # Data preprocessing node
	├── cnn_model.py # Model definitions
	├── pure_pursuit.py # Pure pursuit controller
	├── goal_visualize.py # Goal visualization
	└── semantic_cnn_nav_inference.py # Main inference node
	```
	---

	## Pre-trained Models

	Pre-trained models are included:

	\| Model \| Location \| Description \|
	\|-------\|----------\|-------------\|
	\| `s3_net_model.pth` \| `ros_deployment_ws/.../model/` \| S³-Net semantic segmentation \|
	\| `semantic_cnn_model.pth` \| `training/model/` \| SemanticCNN navigation policy \|

	---

	## Citation

	```bibtex
	@article{xie2026semantic2d,
	title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
	author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
	journal={arXiv preprint arXiv:2409.09899},
	year={2026}
	}

	@inproceedings{xie2021towards,
	title={Towards Safe Navigation Through Crowded Dynamic Environments},
	author={Xie, Zhanteng and Xin, Pujie and Dames, Philip},
	booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
	year={2021},
	doi={10.1109/IROS51168.2021.9636102}
	}
	```