semantic_cnn_nav / README.md
zzuxzt's picture
Upload folder using huggingface_hub
5523920 verified

Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

Semantic CNN Navigation implementation code for our paper "Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone". Video demos can be found at multimedia demonstrations. The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.

Related Resources

Overview

License: MIT

This repository contains two main components:

  1. Training: CNN-based control policy training using the Semantic2D dataset
  2. ROS Deployment: Real-time semantic-aware navigation for mobile robots

The Semantic CNN Navigation system combines:

  • SΒ³-Net: Real-time semantic segmentation of 2D LiDAR scans
  • SemanticCNN: ResNet-based control policy that uses semantic information for navigation

Demo Results

Engineering Lobby Semantic Navigation Engineering Lobby Semantic Navigation

Engineering 4th Floor Semantic Navigation Engineering 4th Floor Semantic Navigation

CYC 4th Floor Semantic Navigation CYC 4th Floor Semantic Navigation

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Semantic CNN Navigation                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  LiDAR Scan │───▢│   SΒ³-Net    │───▢│  Semantic Labels (10)   β”‚  β”‚
β”‚  β”‚  + Intensityβ”‚    β”‚  Segmentationβ”‚    β”‚  per LiDAR point        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                     β”‚                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                    β–Ό                 β”‚
β”‚  β”‚  Sub-Goal   β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  (x, y)     β”‚                        β”‚     SemanticCNN         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚  (ResNet + Bottleneck)  β”‚  β”‚
β”‚                                         β”‚                         β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚  Input: 80x80 scan map  β”‚  β”‚
β”‚  β”‚  Scan Map   │───────────────────────▢│       + semantic map    β”‚  β”‚
β”‚  β”‚  (history)  β”‚                        β”‚       + sub-goal        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                     β”‚                 β”‚
β”‚                                                     β–Ό                 β”‚
β”‚                                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚                                         β”‚  Velocity Command       β”‚  β”‚
β”‚                                         β”‚  (linear_x, angular_z)  β”‚  β”‚
β”‚                                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Requirements

Training

  • Python 3.7+
  • PyTorch 1.7.1+
  • TensorBoard
  • NumPy
  • tqdm

ROS Deployment

  • Ubuntu 20.04
  • ROS Noetic
  • Python 3.8.5
  • PyTorch 1.7.1

Install training dependencies:

pip install torch torchvision tensorboardX numpy tqdm

Part 1: Training

Dataset Structure

The training expects the Semantic2D dataset organized as follows:

~/semantic2d_data/
β”œβ”€β”€ dataset.txt                # List of dataset folders
β”œβ”€β”€ 2024-04-11-15-24-29/       # Dataset folder 1
β”‚   β”œβ”€β”€ train.txt              # Training sample list
β”‚   β”œβ”€β”€ dev.txt                # Validation sample list
β”‚   β”œβ”€β”€ scans_lidar/           # Range scans (.npy)
β”‚   β”œβ”€β”€ semantic_label/        # Semantic labels (.npy)
β”‚   β”œβ”€β”€ sub_goals_local/       # Local sub-goals (.npy)
β”‚   └── velocities/            # Ground truth velocities (.npy)
└── ...

Model Architecture

SemanticCNN uses a ResNet-style architecture with Bottleneck blocks:

Component Details
Input 2 channels: scan map (80x80) + semantic map (80x80)
Backbone ResNet with Bottleneck blocks [2, 1, 1]
Goal Input 2D sub-goal (x, y) concatenated after pooling
Output 2D velocity (linear_x, angular_z)
Loss MSE Loss

Key Parameters:

  • Sequence length: 10 frames
  • Image size: 80x80
  • LiDAR points: 1081 β†’ downsampled to 720 (removing Β±180 points)

Training

Train the Semantic CNN model:

cd training
sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/

Arguments:

  • $1 - Training data directory
  • $2 - Validation data directory

Training Configuration (in scripts/train.py):

Parameter Default Description
NUM_EPOCHS 4000 Total training epochs
BATCH_SIZE 64 Samples per batch
LEARNING_RATE 0.001 Initial learning rate

Learning Rate Schedule:

  • Epochs 0-40: 1e-3
  • Epochs 40-2000: 2e-4
  • Epochs 2000-21000: 2e-5
  • Epochs 21000+: 1e-5

Model checkpoints saved every 50 epochs to ./model/.

Evaluation

Evaluate the trained model:

cd training
sh run_eval.sh ~/semantic2d_data/

Output: Results saved to ./output/

Training File Structure

training/
β”œβ”€β”€ model/
β”‚   └── semantic_cnn_model.pth    # Pretrained model weights
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ model.py                  # SemanticCNN architecture + NavDataset
β”‚   β”œβ”€β”€ train.py                  # Training script
β”‚   └── decode_demo.py            # Evaluation/demo script
β”œβ”€β”€ run_train.sh                  # Training driver script
└── run_eval.sh                   # Evaluation driver script

TensorBoard Monitoring

Training logs are saved to ./runs/. View training progress:

cd training
tensorboard --logdir=runs

Monitored metrics:

  • Training loss
  • Validation loss

Part 2: ROS Deployment

Prerequisites

Install the following ROS packages:

# Create catkin workspace
mkdir -p ~/catkin_ws/src
cd ~/catkin_ws/src

# Clone required packages
git clone https://github.com/TempleRAIL/robot_gazebo.git
git clone https://github.com/TempleRAIL/pedsim_ros_with_gazebo.git

# Build
cd ~/catkin_ws
catkin_make
source devel/setup.bash

Installation

  1. Copy the ROS workspace to your catkin workspace:
cp -r ros_deployment_ws/src/semantic_cnn_nav ~/catkin_ws/src/
  1. Build the workspace:
cd ~/catkin_ws
catkin_make
source devel/setup.bash

Usage

Launch Gazebo Simulation

roslaunch semantic_cnn_nav semantic_cnn_nav_gazebo.launch

This launch file starts:

  • Gazebo simulator with pedestrians (pedsim)
  • AMCL localization
  • CNN data publisher
  • Semantic CNN inference node
  • RViz visualization

Launch Configuration

Key parameters in semantic_cnn_nav_gazebo.launch:

Parameter Default Description
s3_net_model_file model/s3_net_model.pth SΒ³-Net model path
semantic_cnn_model_file model/semantic_cnn_model.pth SemanticCNN model path
scene_file eng_hall_5.xml Pedsim scenario file
world_name eng_hall.world Gazebo world file
map_file gazebo_eng_lobby.yaml Navigation map
initial_pose_x/y/a 1.0, 0.0, 0.13 Robot initial pose

Send Navigation Goals

Use RViz "2D Nav Goal" tool to send navigation goals to the robot.

ROS Nodes

cnn_data_pub

Publishes processed LiDAR data for the CNN.

Subscriptions:

  • /scan (sensor_msgs/LaserScan)

Publications:

  • /cnn_data (cnn_msgs/CNN_data)

semantic_cnn_nav_inference

Main inference node combining SΒ³-Net and SemanticCNN.

Subscriptions:

  • /cnn_data (cnn_msgs/CNN_data)

Publications:

  • /navigation_velocity_smoother/raw_cmd_vel (geometry_msgs/Twist)

Parameters:

  • ~s3_net_model_file: Path to SΒ³-Net model
  • ~semantic_cnn_model_file: Path to SemanticCNN model

ROS Deployment File Structure

ros_deployment_ws/
└── src/
    └── semantic_cnn_nav/
        β”œβ”€β”€ cnn_msgs/
        β”‚   └── msg/
        β”‚       └── CNN_data.msg          # Custom message definition
        └── semantic_cnn/
            β”œβ”€β”€ launch/
            β”‚   β”œβ”€β”€ cnn_data_pub.launch
            β”‚   β”œβ”€β”€ semantic_cnn_inference.launch
            β”‚   └── semantic_cnn_nav_gazebo.launch
            └── src/
                β”œβ”€β”€ model/
                β”‚   β”œβ”€β”€ s3_net_model.pth      # SΒ³-Net pretrained weights
                β”‚   └── semantic_cnn_model.pth # SemanticCNN weights
                β”œβ”€β”€ cnn_data_pub.py           # Data preprocessing node
                β”œβ”€β”€ cnn_model.py              # Model definitions
                β”œβ”€β”€ pure_pursuit.py           # Pure pursuit controller
                β”œβ”€β”€ goal_visualize.py         # Goal visualization
                └── semantic_cnn_nav_inference.py  # Main inference node

Pre-trained Models

Pre-trained models are included:

Model Location Description
s3_net_model.pth ros_deployment_ws/.../model/ SΒ³-Net semantic segmentation
semantic_cnn_model.pth training/model/ SemanticCNN navigation policy

Citation

@article{xie2026semantic2d,
  title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
  author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
  journal={arXiv preprint arXiv:2409.09899},
  year={2026}
}

@inproceedings{xie2021towards,
  title={Towards Safe Navigation Through Crowded Dynamic Environments},
  author={Xie, Zhanteng and Xin, Pujie and Dames, Philip},
  booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2021},
  doi={10.1109/IROS51168.2021.9636102}
}