| | --- |
| | license: mit |
| | library_name: stable-baselines3 |
| | tags: |
| | - reinforcement-learning |
| | - robotics |
| | - autonomous-navigation |
| | - ros2 |
| | - gazebo |
| | - sac |
| | - lidar |
| | - camera |
| | - multi-input |
| | pipeline_tag: reinforcement-learning |
| | --- |
| | |
| | # RC Car Autonomous Navigation β SAC (Camera + LiDAR) |
| |
|
| | A **Soft Actor-Critic (SAC)** agent trained to autonomously navigate an RC car in a simulated Gazebo environment using both **camera images** and **LiDAR sensor data** as observations. The agent learns to reach target positions while avoiding obstacles. |
| |
|
| | --- |
| |
|
| | ## Model Description |
| |
|
| | This model uses a **MultiInputPolicy** with a hybrid perception backbone: |
| |
|
| | - **Visual stream** β RGB camera frames processed by a CNN (NatureCNN) |
| | - **Sensor stream** β LiDAR point cloud + navigation state processed by an MLP |
| |
|
| | Both streams are fused and fed into the SAC actor/critic networks for end-to-end policy learning. |
| |
|
| | | Property | Value | |
| | |---|---| |
| | | Algorithm | Soft Actor-Critic (SAC) | |
| | | Policy | `MultiInputPolicy` | |
| | | Observation | `Dict` β image `(64Γ64Γ3)` + sensor vector `(184,)` | |
| | | Action Space | `Box([-1, -1], [1, 1])` β speed & steering | |
| | | Simulator | Gazebo (Ignition/Harmonic) via ROS 2 | |
| | | Framework | Stable-Baselines3 | |
| |
|
| | --- |
| |
|
| | ## Environments |
| |
|
| | Two training environments are available: |
| |
|
| | ### `RcCarTargetEnv` |
| | The robot spawns at a random position and must navigate to a randomly placed target (red sphere marker). No dynamic obstacles. |
| |
|
| | ### `RcCarComplexEnv` |
| | Same goal-reaching task but with **6 randomly placed box obstacles** that are reshuffled every episode, requiring active collision avoidance. |
| |
|
| | --- |
| |
|
| | ## Observation Space |
| |
|
| | ```python |
| | spaces.Dict({ |
| | "image": spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8), |
| | "sensor": spaces.Box(low=0.0, high=1.0, shape=(184,), dtype=np.float32) |
| | }) |
| | ``` |
| |
|
| | The `sensor` vector contains: |
| | - **[0:180]** β Normalised LiDAR ranges (180 beams, max range 10 m) |
| | - **[180]** β Normalised linear speed |
| | - **[181]** β Normalised steering angle |
| | - **[182]** β Normalised distance to target (clipped at 10 m) |
| | - **[183]** β Normalised relative angle to target |
| |
|
| | --- |
| |
|
| | ## Action Space |
| |
|
| | ```python |
| | spaces.Box(low=[-1.0, -1.0], high=[1.0, 1.0], dtype=np.float32) |
| | ``` |
| |
|
| | | Index | Meaning | Scale | |
| | |---|---|---| |
| | | `action[0]` | Linear speed | Γ 1.0 m/s | |
| | | `action[1]` | Steering angle | Γ 0.6 rad/s | |
| |
|
| | Steering is smoothed with a low-pass filter: `steer = 0.6 Γ prev + 0.4 Γ target`. |
| |
|
| | --- |
| |
|
| | ## Reward Function |
| |
|
| | ### `RcCarTargetEnv` |
| | | Event | Reward | |
| | |---|---| |
| | | Progress toward target | `Ξdistance Γ 40.0` | |
| | | Reached target (< 0.6 m) | `+100.0` | |
| | | Collision (LiDAR < 0.22 m) | `β50.0` | |
| | | Per-step penalty | `β0.05` | |
| |
|
| | ### `RcCarComplexEnv` |
| | | Event | Reward | |
| | |---|---| |
| | | Progress toward target | `Ξdistance Γ 40.0` | |
| | | Forward speed bonus (on progress) | `+speed Γ 0.5` | |
| | | Proximity warning (LiDAR < 0.5 m) | `β0.5` | |
| | | Collision | `β50.0` | |
| | | Reached target | `+100.0` | |
| | | Per-step penalty | `β0.1` | |
| |
|
| | --- |
| |
|
| | ## Training Setup |
| |
|
| | ```python |
| | model = SAC( |
| | "MultiInputPolicy", |
| | env, |
| | learning_rate=3e-4, |
| | buffer_size=50000, |
| | policy_kwargs=dict( |
| | net_arch=dict(pi=[256, 256], qf=[256, 256]) |
| | ), |
| | device="auto" |
| | ) |
| | ``` |
| |
|
| | - **Action repeat:** 4 steps per agent decision |
| | - **Frame stacking:** configurable via Hydra config (`n_stack`) |
| | - **Vectorised env:** `DummyVecEnv` + `VecFrameStack` (channels_order=`"last"`) |
| | - **Experiment tracking:** Weights & Biases (W&B) with SB3 callback |
| | |
| | --- |
| | |
| | ## Hardware & Software Requirements |
| | |
| | | Component | Requirement | |
| | |---|---| |
| | | ROS 2 | Humble or newer | |
| | | Gazebo | Ignition Fortress / Harmonic | |
| | | Python | 3.10+ | |
| | | PyTorch | 2.0+ | |
| | | stable-baselines3 | β₯ 2.0 | |
| | | gymnasium | β₯ 0.29 | |
| | | opencv-python | any recent | |
| | | cv_bridge | ROS 2 package | |
| |
|
| | --- |
| |
|
| | ## How to Use |
| |
|
| | ### 1. Install dependencies |
| | ```bash |
| | pip install stable-baselines3 wandb hydra-core gymnasium opencv-python |
| | ``` |
| |
|
| | ### 2. Launch the simulator |
| | ```bash |
| | ros2 launch my_bot_pkg sim.launch.py |
| | ``` |
| |
|
| | ### 3. Run training |
| | ```bash |
| | python train.py experiment.mode=target experiment.total_timesteps=500000 |
| | ``` |
| |
|
| | ### 4. Load and run inference |
| | ```python |
| | from stable_baselines3 import SAC |
| | from rc_car_envs_camera import RcCarTargetEnv |
| | |
| | env = RcCarTargetEnv() |
| | model = SAC.load("sac_target_camera_final", env=env) |
| | |
| | obs, _ = env.reset() |
| | while True: |
| | action, _ = model.predict(obs, deterministic=True) |
| | obs, reward, terminated, truncated, info = env.step(action) |
| | if terminated or truncated: |
| | obs, _ = env.reset() |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Project Structure |
| |
|
| | ``` |
| | βββ rc_car_envs_camera.py # Gym environments (Base, Target, Complex) |
| | βββ train.py # Hydra-based training entry point |
| | βββ configs/ |
| | β βββ config.yaml # Hydra config (mode, timesteps, wandb, etc.) |
| | βββ models/ # Saved checkpoints (W&B) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Limitations & Known Issues |
| |
|
| | - Training requires a live ROS 2 + Gazebo session; no offline/headless mode currently. |
| | - `DummyVecEnv` runs a single environment β parallelisation would require `SubprocVecEnv` with careful ROS node naming. |
| | - Camera latency under heavy load may cause the `scan_received` / `cam_received` wait loop to time out, potentially delivering stale observations. |
| | - The collision threshold (0.22 m) is tuned for the specific robot mesh; adjust for different URDF geometries. |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | If you use this environment or training code in your research, please cite: |
| |
|
| | ```bibtex |
| | @misc{rccar_sac_nav, |
| | title = {RC Car Autonomous Navigation with SAC (Camera + LiDAR)}, |
| | year = {2025}, |
| | url = {https://huggingface.co/Hajorda/SAC_Complex_Camera} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | MIT License |