File size: 5,666 Bytes
feeadbd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
license: mit
library_name: stable-baselines3
tags:
  - reinforcement-learning
  - robotics
  - autonomous-navigation
  - ros2
  - gazebo
  - sac
  - lidar
  - camera
  - multi-input
pipeline_tag: reinforcement-learning
---

# RC Car Autonomous Navigation β€” SAC (Camera + LiDAR)

A **Soft Actor-Critic (SAC)** agent trained to autonomously navigate an RC car in a simulated Gazebo environment using both **camera images** and **LiDAR sensor data** as observations. The agent learns to reach target positions while avoiding obstacles.

---

## Model Description

This model uses a **MultiInputPolicy** with a hybrid perception backbone:

- **Visual stream** β€” RGB camera frames processed by a CNN (NatureCNN)
- **Sensor stream** β€” LiDAR point cloud + navigation state processed by an MLP

Both streams are fused and fed into the SAC actor/critic networks for end-to-end policy learning.

| Property | Value |
|---|---|
| Algorithm | Soft Actor-Critic (SAC) |
| Policy | `MultiInputPolicy` |
| Observation | `Dict` β€” image `(64Γ—64Γ—3)` + sensor vector `(184,)` |
| Action Space | `Box([-1, -1], [1, 1])` β€” speed & steering |
| Simulator | Gazebo (Ignition/Harmonic) via ROS 2 |
| Framework | Stable-Baselines3 |

---

## Environments

Two training environments are available:

### `RcCarTargetEnv`
The robot spawns at a random position and must navigate to a randomly placed target (red sphere marker). No dynamic obstacles.

### `RcCarComplexEnv`
Same goal-reaching task but with **6 randomly placed box obstacles** that are reshuffled every episode, requiring active collision avoidance.

---

## Observation Space

```python
spaces.Dict({
    "image": spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8),
    "sensor": spaces.Box(low=0.0, high=1.0, shape=(184,), dtype=np.float32)
})
```

The `sensor` vector contains:
- **[0:180]** β€” Normalised LiDAR ranges (180 beams, max range 10 m)
- **[180]** β€” Normalised linear speed
- **[181]** β€” Normalised steering angle
- **[182]** β€” Normalised distance to target (clipped at 10 m)
- **[183]** β€” Normalised relative angle to target

---

## Action Space

```python
spaces.Box(low=[-1.0, -1.0], high=[1.0, 1.0], dtype=np.float32)
```

| Index | Meaning | Scale |
|---|---|---|
| `action[0]` | Linear speed | Γ— 1.0 m/s |
| `action[1]` | Steering angle | Γ— 0.6 rad/s |

Steering is smoothed with a low-pass filter: `steer = 0.6 Γ— prev + 0.4 Γ— target`.

---

## Reward Function

### `RcCarTargetEnv`
| Event | Reward |
|---|---|
| Progress toward target | `Ξ”distance Γ— 40.0` |
| Reached target (< 0.6 m) | `+100.0` |
| Collision (LiDAR < 0.22 m) | `βˆ’50.0` |
| Per-step penalty | `βˆ’0.05` |

### `RcCarComplexEnv`
| Event | Reward |
|---|---|
| Progress toward target | `Ξ”distance Γ— 40.0` |
| Forward speed bonus (on progress) | `+speed Γ— 0.5` |
| Proximity warning (LiDAR < 0.5 m) | `βˆ’0.5` |
| Collision | `βˆ’50.0` |
| Reached target | `+100.0` |
| Per-step penalty | `βˆ’0.1` |

---

## Training Setup

```python
model = SAC(
    "MultiInputPolicy",
    env,
    learning_rate=3e-4,
    buffer_size=50000,
    policy_kwargs=dict(
        net_arch=dict(pi=[256, 256], qf=[256, 256])
    ),
    device="auto"
)
```

- **Action repeat:** 4 steps per agent decision
- **Frame stacking:** configurable via Hydra config (`n_stack`)
- **Vectorised env:** `DummyVecEnv` + `VecFrameStack` (channels_order=`"last"`)
- **Experiment tracking:** Weights & Biases (W&B) with SB3 callback

---

## Hardware & Software Requirements

| Component | Requirement |
|---|---|
| ROS 2 | Humble or newer |
| Gazebo | Ignition Fortress / Harmonic |
| Python | 3.10+ |
| PyTorch | 2.0+ |
| stable-baselines3 | β‰₯ 2.0 |
| gymnasium | β‰₯ 0.29 |
| opencv-python | any recent |
| cv_bridge | ROS 2 package |

---

## How to Use

### 1. Install dependencies
```bash
pip install stable-baselines3 wandb hydra-core gymnasium opencv-python
```

### 2. Launch the simulator
```bash
ros2 launch my_bot_pkg sim.launch.py
```

### 3. Run training
```bash
python train.py experiment.mode=target experiment.total_timesteps=500000
```

### 4. Load and run inference
```python
from stable_baselines3 import SAC
from rc_car_envs_camera import RcCarTargetEnv

env = RcCarTargetEnv()
model = SAC.load("sac_target_camera_final", env=env)

obs, _ = env.reset()
while True:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, _ = env.reset()
```

---

## Project Structure

```
β”œβ”€β”€ rc_car_envs_camera.py   # Gym environments (Base, Target, Complex)
β”œβ”€β”€ train.py                # Hydra-based training entry point
β”œβ”€β”€ configs/
β”‚   └── config.yaml         # Hydra config (mode, timesteps, wandb, etc.)
└── models/                 # Saved checkpoints (W&B)
```

---

## Limitations & Known Issues

- Training requires a live ROS 2 + Gazebo session; no offline/headless mode currently.
- `DummyVecEnv` runs a single environment β€” parallelisation would require `SubprocVecEnv` with careful ROS node naming.
- Camera latency under heavy load may cause the `scan_received` / `cam_received` wait loop to time out, potentially delivering stale observations.
- The collision threshold (0.22 m) is tuned for the specific robot mesh; adjust for different URDF geometries.

---

## Citation

If you use this environment or training code in your research, please cite:

```bibtex
@misc{rccar_sac_nav,
  title  = {RC Car Autonomous Navigation with SAC (Camera + LiDAR)},
  year   = {2025},
  url    = {https://huggingface.co/Hajorda/SAC_Complex_Camera}
}
```

---

## License

MIT License