semantic_cnn_nav / README.md

zzuxzt

Upload folder using huggingface_hub

5523920 verified 2 days ago

preview code

raw

history blame contribute delete

11.8 kB

Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

Semantic CNN Navigation implementation code for our paper "Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone". Video demos can be found at multimedia demonstrations. The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.

Related Resources

Dataset Download: https://doi.org/10.5281/zenodo.18350696
SALSA (Dataset and Labeling Framework): https://github.com/TempleRAIL/semantic2d
S³-Net (Stochastic Semantic Segmentation): https://github.com/TempleRAIL/s3_net
Semantic CNN Navigation: https://github.com/TempleRAIL/semantic_cnn_nav

Overview

This repository contains two main components:

Training: CNN-based control policy training using the Semantic2D dataset
ROS Deployment: Real-time semantic-aware navigation for mobile robots

The Semantic CNN Navigation system combines:

S³-Net: Real-time semantic segmentation of 2D LiDAR scans
SemanticCNN: ResNet-based control policy that uses semantic information for navigation

Demo Results

Engineering Lobby Semantic Navigation

Engineering 4th Floor Semantic Navigation

CYC 4th Floor Semantic Navigation

System Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                     Semantic CNN Navigation                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                       │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────┐  │
│  │  LiDAR Scan │───▶│   S³-Net    │───▶│  Semantic Labels (10)   │  │
│  │  + Intensity│    │  Segmentation│    │  per LiDAR point        │  │
│  └─────────────┘    └─────────────┘    └───────────┬─────────────┘  │
│                                                     │                 │
│  ┌─────────────┐                                    ▼                 │
│  │  Sub-Goal   │───────────────────────▶┌─────────────────────────┐  │
│  │  (x, y)     │                        │     SemanticCNN         │  │
│  └─────────────┘                        │  (ResNet + Bottleneck)  │  │
│                                         │                         │  │
│  ┌─────────────┐                        │  Input: 80x80 scan map  │  │
│  │  Scan Map   │───────────────────────▶│       + semantic map    │  │
│  │  (history)  │                        │       + sub-goal        │  │
│  └─────────────┘                        └───────────┬─────────────┘  │
│                                                     │                 │
│                                                     ▼                 │
│                                         ┌─────────────────────────┐  │
│                                         │  Velocity Command       │  │
│                                         │  (linear_x, angular_z)  │  │
│                                         └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Requirements

Training

Python 3.7+
PyTorch 1.7.1+
TensorBoard
NumPy
tqdm

ROS Deployment

Ubuntu 20.04
ROS Noetic
Python 3.8.5
PyTorch 1.7.1

Install training dependencies:

pip install torch torchvision tensorboardX numpy tqdm

Part 1: Training

Dataset Structure

The training expects the Semantic2D dataset organized as follows:

~/semantic2d_data/
├── dataset.txt                # List of dataset folders
├── 2024-04-11-15-24-29/       # Dataset folder 1
│   ├── train.txt              # Training sample list
│   ├── dev.txt                # Validation sample list
│   ├── scans_lidar/           # Range scans (.npy)
│   ├── semantic_label/        # Semantic labels (.npy)
│   ├── sub_goals_local/       # Local sub-goals (.npy)
│   └── velocities/            # Ground truth velocities (.npy)
└── ...

Model Architecture

SemanticCNN uses a ResNet-style architecture with Bottleneck blocks:

Component	Details
Input	2 channels: scan map (80x80) + semantic map (80x80)
Backbone	ResNet with Bottleneck blocks [2, 1, 1]
Goal Input	2D sub-goal (x, y) concatenated after pooling
Output	2D velocity (linear_x, angular_z)
Loss	MSE Loss

Key Parameters:

Sequence length: 10 frames
Image size: 80x80
LiDAR points: 1081 → downsampled to 720 (removing ±180 points)

Training

Train the Semantic CNN model:

cd training
sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/

Arguments:

$1 - Training data directory
$2 - Validation data directory

Training Configuration (in scripts/train.py):

Parameter	Default	Description
`NUM_EPOCHS`	4000	Total training epochs
`BATCH_SIZE`	64	Samples per batch
`LEARNING_RATE`	0.001	Initial learning rate

Learning Rate Schedule:

Epochs 0-40: 1e-3
Epochs 40-2000: 2e-4
Epochs 2000-21000: 2e-5
Epochs 21000+: 1e-5

Model checkpoints saved every 50 epochs to ./model/.

Evaluation

Evaluate the trained model:

cd training
sh run_eval.sh ~/semantic2d_data/

Output: Results saved to ./output/

Training File Structure

training/
├── model/
│   └── semantic_cnn_model.pth    # Pretrained model weights
├── scripts/
│   ├── model.py                  # SemanticCNN architecture + NavDataset
│   ├── train.py                  # Training script
│   └── decode_demo.py            # Evaluation/demo script
├── run_train.sh                  # Training driver script
└── run_eval.sh                   # Evaluation driver script

TensorBoard Monitoring

Training logs are saved to ./runs/. View training progress:

cd training
tensorboard --logdir=runs

Monitored metrics:

Training loss
Validation loss

Part 2: ROS Deployment

Prerequisites

Install the following ROS packages:

# Create catkin workspace
mkdir -p ~/catkin_ws/src
cd ~/catkin_ws/src

# Clone required packages
git clone https://github.com/TempleRAIL/robot_gazebo.git
git clone https://github.com/TempleRAIL/pedsim_ros_with_gazebo.git

# Build
cd ~/catkin_ws
catkin_make
source devel/setup.bash

Installation

Copy the ROS workspace to your catkin workspace:

cp -r ros_deployment_ws/src/semantic_cnn_nav ~/catkin_ws/src/

Build the workspace:

cd ~/catkin_ws
catkin_make
source devel/setup.bash

Usage

Launch Gazebo Simulation

roslaunch semantic_cnn_nav semantic_cnn_nav_gazebo.launch

This launch file starts:

Gazebo simulator with pedestrians (pedsim)
AMCL localization
CNN data publisher
Semantic CNN inference node
RViz visualization

Launch Configuration

Key parameters in semantic_cnn_nav_gazebo.launch:

Parameter	Default	Description
`s3_net_model_file`	`model/s3_net_model.pth`	S³-Net model path
`semantic_cnn_model_file`	`model/semantic_cnn_model.pth`	SemanticCNN model path
`scene_file`	`eng_hall_5.xml`	Pedsim scenario file
`world_name`	`eng_hall.world`	Gazebo world file
`map_file`	`gazebo_eng_lobby.yaml`	Navigation map
`initial_pose_x/y/a`	1.0, 0.0, 0.13	Robot initial pose

Send Navigation Goals

Use RViz "2D Nav Goal" tool to send navigation goals to the robot.

ROS Nodes

cnn_data_pub

Publishes processed LiDAR data for the CNN.

Subscriptions:

/scan (sensor_msgs/LaserScan)

Publications:

/cnn_data (cnn_msgs/CNN_data)

semantic_cnn_nav_inference

Main inference node combining S³-Net and SemanticCNN.

Subscriptions:

/cnn_data (cnn_msgs/CNN_data)

Publications:

/navigation_velocity_smoother/raw_cmd_vel (geometry_msgs/Twist)

Parameters:

~s3_net_model_file: Path to S³-Net model
~semantic_cnn_model_file: Path to SemanticCNN model

ROS Deployment File Structure

ros_deployment_ws/
└── src/
    └── semantic_cnn_nav/
        ├── cnn_msgs/
        │   └── msg/
        │       └── CNN_data.msg          # Custom message definition
        └── semantic_cnn/
            ├── launch/
            │   ├── cnn_data_pub.launch
            │   ├── semantic_cnn_inference.launch
            │   └── semantic_cnn_nav_gazebo.launch
            └── src/
                ├── model/
                │   ├── s3_net_model.pth      # S³-Net pretrained weights
                │   └── semantic_cnn_model.pth # SemanticCNN weights
                ├── cnn_data_pub.py           # Data preprocessing node
                ├── cnn_model.py              # Model definitions
                ├── pure_pursuit.py           # Pure pursuit controller
                ├── goal_visualize.py         # Goal visualization
                └── semantic_cnn_nav_inference.py  # Main inference node

Pre-trained Models

Pre-trained models are included:

Model	Location	Description
`s3_net_model.pth`	`ros_deployment_ws/.../model/`	S³-Net semantic segmentation
`semantic_cnn_model.pth`	`training/model/`	SemanticCNN navigation policy

Citation

@article{xie2026semantic2d,
  title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
  author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
  journal={arXiv preprint arXiv:2409.09899},
  year={2026}
}

@inproceedings{xie2021towards,
  title={Towards Safe Navigation Through Crowded Dynamic Environments},
  author={Xie, Zhanteng and Xin, Pujie and Dames, Philip},
  booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2021},
  doi={10.1109/IROS51168.2021.9636102}
}