File size: 11,840 Bytes
# Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

Semantic CNN Navigation implementation code for our paper ["Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone"](https://arxiv.org/pdf/2409.09899).
Video demos can be found at [multimedia demonstrations](https://youtu.be/P1Hsvj6WUSY).
The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.

## Related Resources

- **Dataset Download:** https://doi.org/10.5281/zenodo.18350696
- **SALSA (Dataset and Labeling Framework):** https://github.com/TempleRAIL/semantic2d
- **S³-Net (Stochastic Semantic Segmentation):** https://github.com/TempleRAIL/s3_net
- **Semantic CNN Navigation:** https://github.com/TempleRAIL/semantic_cnn_nav

## Overview

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

This repository contains two main components:
1. **Training**: CNN-based control policy training using the Semantic2D dataset
2. **ROS Deployment**: Real-time semantic-aware navigation for mobile robots

The Semantic CNN Navigation system combines:
- **S³-Net**: Real-time semantic segmentation of 2D LiDAR scans
- **SemanticCNN**: ResNet-based control policy that uses semantic information for navigation

## Demo Results

**Engineering Lobby Semantic Navigation**
![Engineering Lobby Semantic Navigation](./demo/1.lobby_semantic_navigation.gif)

**Engineering 4th Floor Semantic Navigation**
![Engineering 4th Floor Semantic Navigation](./demo/1.eng4th_semantic_navigation.gif)

**CYC 4th Floor Semantic Navigation**
![CYC 4th Floor Semantic Navigation](./demo/3.cyc4th_semantic_navigation.gif)

## System Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                     Semantic CNN Navigation                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                       │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────┐  │
│  │  LiDAR Scan │───▶│   S³-Net    │───▶│  Semantic Labels (10)   │  │
│  │  + Intensity│    │  Segmentation│    │  per LiDAR point        │  │
│  └─────────────┘    └─────────────┘    └───────────┬─────────────┘  │
│                                                     │                 │
│  ┌─────────────┐                                    ▼                 │
│  │  Sub-Goal   │───────────────────────▶┌─────────────────────────┐  │
│  │  (x, y)     │                        │     SemanticCNN         │  │
│  └─────────────┘                        │  (ResNet + Bottleneck)  │  │
│                                         │                         │  │
│  ┌─────────────┐                        │  Input: 80x80 scan map  │  │
│  │  Scan Map   │───────────────────────▶│       + semantic map    │  │
│  │  (history)  │                        │       + sub-goal        │  │
│  └─────────────┘                        └───────────┬─────────────┘  │
│                                                     │                 │
│                                                     ▼                 │
│                                         ┌─────────────────────────┐  │
│                                         │  Velocity Command       │  │
│                                         │  (linear_x, angular_z)  │  │
│                                         └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
```

## Requirements

### Training
- Python 3.7+
- PyTorch 1.7.1+
- TensorBoard
- NumPy
- tqdm

### ROS Deployment
- Ubuntu 20.04
- ROS Noetic
- Python 3.8.5
- PyTorch 1.7.1

Install training dependencies:
```bash
pip install torch torchvision tensorboardX numpy tqdm
```

---

# Part 1: Training

## Dataset Structure

The training expects the Semantic2D dataset organized as follows:

```
~/semantic2d_data/
├── dataset.txt                # List of dataset folders
├── 2024-04-11-15-24-29/       # Dataset folder 1
│   ├── train.txt              # Training sample list
│   ├── dev.txt                # Validation sample list
│   ├── scans_lidar/           # Range scans (.npy)
│   ├── semantic_label/        # Semantic labels (.npy)
│   ├── sub_goals_local/       # Local sub-goals (.npy)
│   └── velocities/            # Ground truth velocities (.npy)
└── ...
```

## Model Architecture

**SemanticCNN** uses a ResNet-style architecture with Bottleneck blocks:

| Component | Details |
|-----------|---------|
| **Input** | 2 channels: scan map (80x80) + semantic map (80x80) |
| **Backbone** | ResNet with Bottleneck blocks [2, 1, 1] |
| **Goal Input** | 2D sub-goal (x, y) concatenated after pooling |
| **Output** | 2D velocity (linear_x, angular_z) |
| **Loss** | MSE Loss |

**Key Parameters:**
- Sequence length: 10 frames
- Image size: 80x80
- LiDAR points: 1081 → downsampled to 720 (removing ±180 points)

## Training

Train the Semantic CNN model:

```bash
cd training
sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/
```

**Arguments:**
- `$1` - Training data directory
- `$2` - Validation data directory

**Training Configuration** (in `scripts/train.py`):

| Parameter | Default | Description |
|-----------|---------|-------------|
| `NUM_EPOCHS` | 4000 | Total training epochs |
| `BATCH_SIZE` | 64 | Samples per batch |
| `LEARNING_RATE` | 0.001 | Initial learning rate |

**Learning Rate Schedule:**
- Epochs 0-40: `1e-3`
- Epochs 40-2000: `2e-4`
- Epochs 2000-21000: `2e-5`
- Epochs 21000+: `1e-5`

Model checkpoints saved every 50 epochs to `./model/`.

## Evaluation

Evaluate the trained model:

```bash
cd training
sh run_eval.sh ~/semantic2d_data/
```

**Output:** Results saved to `./output/`

## Training File Structure

```
training/
├── model/
│   └── semantic_cnn_model.pth    # Pretrained model weights
├── scripts/
│   ├── model.py                  # SemanticCNN architecture + NavDataset
│   ├── train.py                  # Training script
│   └── decode_demo.py            # Evaluation/demo script
├── run_train.sh                  # Training driver script
└── run_eval.sh                   # Evaluation driver script
```

---

## TensorBoard Monitoring

Training logs are saved to `./runs/`. View training progress:

```bash
cd training
tensorboard --logdir=runs
```

Monitored metrics:
- Training loss
- Validation loss

---

# Part 2: ROS Deployment

## Prerequisites

Install the following ROS packages:

```bash
# Create catkin workspace
mkdir -p ~/catkin_ws/src
cd ~/catkin_ws/src

# Clone required packages
git clone https://github.com/TempleRAIL/robot_gazebo.git
git clone https://github.com/TempleRAIL/pedsim_ros_with_gazebo.git

# Build
cd ~/catkin_ws
catkin_make
source devel/setup.bash
```

## Installation

1. Copy the ROS workspace to your catkin workspace:
```bash
cp -r ros_deployment_ws/src/semantic_cnn_nav ~/catkin_ws/src/
```

2. Build the workspace:
```bash
cd ~/catkin_ws
catkin_make
source devel/setup.bash
```

## Usage

### Launch Gazebo Simulation

```bash
roslaunch semantic_cnn_nav semantic_cnn_nav_gazebo.launch
```

This launch file starts:
- Gazebo simulator with pedestrians (pedsim)
- AMCL localization
- CNN data publisher
- Semantic CNN inference node
- RViz visualization

### Launch Configuration

Key parameters in `semantic_cnn_nav_gazebo.launch`:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `s3_net_model_file` | `model/s3_net_model.pth` | S³-Net model path |
| `semantic_cnn_model_file` | `model/semantic_cnn_model.pth` | SemanticCNN model path |
| `scene_file` | `eng_hall_5.xml` | Pedsim scenario file |
| `world_name` | `eng_hall.world` | Gazebo world file |
| `map_file` | `gazebo_eng_lobby.yaml` | Navigation map |
| `initial_pose_x/y/a` | 1.0, 0.0, 0.13 | Robot initial pose |

### Send Navigation Goals

Use RViz "2D Nav Goal" tool to send navigation goals to the robot.

## ROS Nodes

### cnn_data_pub
Publishes processed LiDAR data for the CNN.

**Subscriptions:**
- `/scan` (sensor_msgs/LaserScan)

**Publications:**
- `/cnn_data` (cnn_msgs/CNN_data)

### semantic_cnn_nav_inference
Main inference node combining S³-Net and SemanticCNN.

**Subscriptions:**
- `/cnn_data` (cnn_msgs/CNN_data)

**Publications:**
- `/navigation_velocity_smoother/raw_cmd_vel` (geometry_msgs/Twist)

**Parameters:**
- `~s3_net_model_file`: Path to S³-Net model
- `~semantic_cnn_model_file`: Path to SemanticCNN model

## ROS Deployment File Structure

```
ros_deployment_ws/
└── src/
    └── semantic_cnn_nav/
        ├── cnn_msgs/
        │   └── msg/
        │       └── CNN_data.msg          # Custom message definition
        └── semantic_cnn/
            ├── launch/
            │   ├── cnn_data_pub.launch
            │   ├── semantic_cnn_inference.launch
            │   └── semantic_cnn_nav_gazebo.launch
            └── src/
                ├── model/
                │   ├── s3_net_model.pth      # S³-Net pretrained weights
                │   └── semantic_cnn_model.pth # SemanticCNN weights
                ├── cnn_data_pub.py           # Data preprocessing node
                ├── cnn_model.py              # Model definitions
                ├── pure_pursuit.py           # Pure pursuit controller
                ├── goal_visualize.py         # Goal visualization
                └── semantic_cnn_nav_inference.py  # Main inference node
```
---

## Pre-trained Models

Pre-trained models are included:

| Model | Location | Description |
|-------|----------|-------------|
| `s3_net_model.pth` | `ros_deployment_ws/.../model/` | S³-Net semantic segmentation |
| `semantic_cnn_model.pth` | `training/model/` | SemanticCNN navigation policy |

---

## Citation

```bibtex
@article{xie2026semantic2d,
  title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
  author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
  journal={arXiv preprint arXiv:2409.09899},
  year={2026}
}

@inproceedings{xie2021towards,
  title={Towards Safe Navigation Through Crowded Dynamic Environments},
  author={Xie, Zhanteng and Xin, Pujie and Dames, Philip},
  booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2021},
  doi={10.1109/IROS51168.2021.9636102}
}
```