Hajorda
/

SAC_Complex_Camera

+---
+license: mit
+library_name: stable-baselines3
+tags:
+  - reinforcement-learning
+  - robotics
+  - autonomous-navigation
+  - ros2
+  - gazebo
+  - sac
+  - lidar
+  - camera
+  - multi-input
+pipeline_tag: reinforcement-learning
+---
+# RC Car Autonomous Navigation — SAC (Camera + LiDAR)
+A **Soft Actor-Critic (SAC)** agent trained to autonomously navigate an RC car in a simulated Gazebo environment using both **camera images** and **LiDAR sensor data** as observations. The agent learns to reach target positions while avoiding obstacles.
+---
+## Model Description
+This model uses a **MultiInputPolicy** with a hybrid perception backbone:
+- **Visual stream** — RGB camera frames processed by a CNN (NatureCNN)
+- **Sensor stream** — LiDAR point cloud + navigation state processed by an MLP
+Both streams are fused and fed into the SAC actor/critic networks for end-to-end policy learning.
+| Property | Value |
+|---|---|
+| Algorithm | Soft Actor-Critic (SAC) |
+| Policy | `MultiInputPolicy` |
+| Observation | `Dict` — image `(64×64×3)` + sensor vector `(184,)` |
+| Action Space | `Box([-1, -1], [1, 1])` — speed & steering |
+| Simulator | Gazebo (Ignition/Harmonic) via ROS 2 |
+| Framework | Stable-Baselines3 |
+---
+## Environments
+Two training environments are available:
+### `RcCarTargetEnv`
+The robot spawns at a random position and must navigate to a randomly placed target (red sphere marker). No dynamic obstacles.
+### `RcCarComplexEnv`
+Same goal-reaching task but with **6 randomly placed box obstacles** that are reshuffled every episode, requiring active collision avoidance.
+---
+## Observation Space
+```python
+spaces.Dict({
+    "image": spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8),
+    "sensor": spaces.Box(low=0.0, high=1.0, shape=(184,), dtype=np.float32)
+})
+```
+The `sensor` vector contains:
+- **[0:180]** — Normalised LiDAR ranges (180 beams, max range 10 m)
+- **[180]** — Normalised linear speed
+- **[181]** — Normalised steering angle
+- **[182]** — Normalised distance to target (clipped at 10 m)
+- **[183]** — Normalised relative angle to target
+---
+## Action Space
+```python
+spaces.Box(low=[-1.0, -1.0], high=[1.0, 1.0], dtype=np.float32)
+```
+| Index | Meaning | Scale |
+|---|---|---|
+| `action[0]` | Linear speed | × 1.0 m/s |
+| `action[1]` | Steering angle | × 0.6 rad/s |
+Steering is smoothed with a low-pass filter: `steer = 0.6 × prev + 0.4 × target`.
+---
+## Reward Function
+### `RcCarTargetEnv`
+| Event | Reward |
+|---|---|
+| Progress toward target | `Δdistance × 40.0` |
+| Reached target (< 0.6 m) | `+100.0` |
+| Collision (LiDAR < 0.22 m) | `−50.0` |
+| Per-step penalty | `−0.05` |
+### `RcCarComplexEnv`
+| Event | Reward |
+|---|---|
+| Progress toward target | `Δdistance × 40.0` |
+| Forward speed bonus (on progress) | `+speed × 0.5` |
+| Proximity warning (LiDAR < 0.5 m) | `−0.5` |
+| Collision | `−50.0` |
+| Reached target | `+100.0` |
+| Per-step penalty | `−0.1` |
+---
+## Training Setup
+```python
+model = SAC(
+    "MultiInputPolicy",
+    env,
+    learning_rate=3e-4,
+    buffer_size=50000,
+    policy_kwargs=dict(
+        net_arch=dict(pi=[256, 256], qf=[256, 256])
+    ),
+    device="auto"
+)
+```
+- **Action repeat:** 4 steps per agent decision
+- **Frame stacking:** configurable via Hydra config (`n_stack`)
+- **Vectorised env:** `DummyVecEnv` + `VecFrameStack` (channels_order=`"last"`)
+- **Experiment tracking:** Weights & Biases (W&B) with SB3 callback
+---
+## Hardware & Software Requirements
+| Component | Requirement |
+|---|---|
+| ROS 2 | Humble or newer |
+| Gazebo | Ignition Fortress / Harmonic |
+| Python | 3.10+ |
+| PyTorch | 2.0+ |
+| stable-baselines3 | ≥ 2.0 |
+| gymnasium | ≥ 0.29 |
+| opencv-python | any recent |
+| cv_bridge | ROS 2 package |
+---
+## How to Use
+### 1. Install dependencies
+```bash
+pip install stable-baselines3 wandb hydra-core gymnasium opencv-python
+```
+### 2. Launch the simulator
+```bash
+ros2 launch my_bot_pkg sim.launch.py
+```
+### 3. Run training
+```bash
+python train.py experiment.mode=target experiment.total_timesteps=500000
+```
+### 4. Load and run inference
+```python
+from stable_baselines3 import SAC
+from rc_car_envs_camera import RcCarTargetEnv
+env = RcCarTargetEnv()
+model = SAC.load("sac_target_camera_final", env=env)
+obs, _ = env.reset()
+while True:
+    action, _ = model.predict(obs, deterministic=True)
+    obs, reward, terminated, truncated, info = env.step(action)
+    if terminated or truncated:
+        obs, _ = env.reset()
+```
+---
+## Project Structure
+```
+├── rc_car_envs_camera.py   # Gym environments (Base, Target, Complex)
+├── train.py                # Hydra-based training entry point
+├── configs/
+│   └── config.yaml         # Hydra config (mode, timesteps, wandb, etc.)
+└── models/                 # Saved checkpoints (W&B)
+```
+---
+## Limitations & Known Issues
+- Training requires a live ROS 2 + Gazebo session; no offline/headless mode currently.
+- `DummyVecEnv` runs a single environment — parallelisation would require `SubprocVecEnv` with careful ROS node naming.
+- Camera latency under heavy load may cause the `scan_received` / `cam_received` wait loop to time out, potentially delivering stale observations.
+- The collision threshold (0.22 m) is tuned for the specific robot mesh; adjust for different URDF geometries.
+---
+## Citation
+If you use this environment or training code in your research, please cite:
+```bibtex
+@misc{rccar_sac_nav,
+  title  = {RC Car Autonomous Navigation with SAC (Camera + LiDAR)},
+  year   = {2025},
+  url    = {https://huggingface.co/Hajorda/SAC_Complex_Camera}
+}
+```
+---
+## License
+MIT License