Create README.md

feeadbd verified 5 days ago

5.67 kB

	---
	license: mit
	library_name: stable-baselines3
	tags:
	- reinforcement-learning
	- robotics
	- autonomous-navigation
	- ros2
	- gazebo
	- sac
	- lidar
	- camera
	- multi-input
	pipeline_tag: reinforcement-learning
	---

	# RC Car Autonomous Navigation — SAC (Camera + LiDAR)

	A Soft Actor-Critic (SAC) agent trained to autonomously navigate an RC car in a simulated Gazebo environment using both camera images and LiDAR sensor data as observations. The agent learns to reach target positions while avoiding obstacles.

	---

	## Model Description

	This model uses a MultiInputPolicy with a hybrid perception backbone:

	- Visual stream — RGB camera frames processed by a CNN (NatureCNN)
	- Sensor stream — LiDAR point cloud + navigation state processed by an MLP

	Both streams are fused and fed into the SAC actor/critic networks for end-to-end policy learning.

	\| Property \| Value \|
	\|---\|---\|
	\| Algorithm \| Soft Actor-Critic (SAC) \|
	\| Policy \| `MultiInputPolicy` \|
	\| Observation \| `Dict` — image `(64×64×3)` + sensor vector `(184,)` \|
	\| Action Space \| `Box([-1, -1], [1, 1])` — speed & steering \|
	\| Simulator \| Gazebo (Ignition/Harmonic) via ROS 2 \|
	\| Framework \| Stable-Baselines3 \|

	---

	## Environments

	Two training environments are available:

	### `RcCarTargetEnv`
	The robot spawns at a random position and must navigate to a randomly placed target (red sphere marker). No dynamic obstacles.

	### `RcCarComplexEnv`
	Same goal-reaching task but with 6 randomly placed box obstacles that are reshuffled every episode, requiring active collision avoidance.

	---

	## Observation Space

	```python
	spaces.Dict({
	"image": spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8),
	"sensor": spaces.Box(low=0.0, high=1.0, shape=(184,), dtype=np.float32)
	})
	```

	The `sensor` vector contains:
	- [0:180] — Normalised LiDAR ranges (180 beams, max range 10 m)
	- [180] — Normalised linear speed
	- [181] — Normalised steering angle
	- [182] — Normalised distance to target (clipped at 10 m)
	- [183] — Normalised relative angle to target

	---

	## Action Space

	```python
	spaces.Box(low=[-1.0, -1.0], high=[1.0, 1.0], dtype=np.float32)
	```

	\| Index \| Meaning \| Scale \|
	\|---\|---\|---\|
	\| `action[0]` \| Linear speed \| × 1.0 m/s \|
	\| `action[1]` \| Steering angle \| × 0.6 rad/s \|

	Steering is smoothed with a low-pass filter: `steer = 0.6 × prev + 0.4 × target`.

	---

	## Reward Function

	### `RcCarTargetEnv`
	\| Event \| Reward \|
	\|---\|---\|
	\| Progress toward target \| `Δdistance × 40.0` \|
	\| Reached target (< 0.6 m) \| `+100.0` \|
	\| Collision (LiDAR < 0.22 m) \| `−50.0` \|
	\| Per-step penalty \| `−0.05` \|

	### `RcCarComplexEnv`
	\| Event \| Reward \|
	\|---\|---\|
	\| Progress toward target \| `Δdistance × 40.0` \|
	\| Forward speed bonus (on progress) \| `+speed × 0.5` \|
	\| Proximity warning (LiDAR < 0.5 m) \| `−0.5` \|
	\| Collision \| `−50.0` \|
	\| Reached target \| `+100.0` \|
	\| Per-step penalty \| `−0.1` \|

	---

	## Training Setup

	```python
	model = SAC(
	"MultiInputPolicy",
	env,
	learning_rate=3e-4,
	buffer_size=50000,
	policy_kwargs=dict(
	net_arch=dict(pi=[256, 256], qf=[256, 256])
	),
	device="auto"
	)
	```

	- Action repeat: 4 steps per agent decision
	- Frame stacking: configurable via Hydra config (`n_stack`)
	- Vectorised env: `DummyVecEnv` + `VecFrameStack` (channels_order=`"last"`)
	- Experiment tracking: Weights & Biases (W&B) with SB3 callback

	---

	## Hardware & Software Requirements

	\| Component \| Requirement \|
	\|---\|---\|
	\| ROS 2 \| Humble or newer \|
	\| Gazebo \| Ignition Fortress / Harmonic \|
	\| Python \| 3.10+ \|
	\| PyTorch \| 2.0+ \|
	\| stable-baselines3 \| ≥ 2.0 \|
	\| gymnasium \| ≥ 0.29 \|
	\| opencv-python \| any recent \|
	\| cv_bridge \| ROS 2 package \|

	---

	## How to Use

	### 1. Install dependencies
	```bash
	pip install stable-baselines3 wandb hydra-core gymnasium opencv-python
	```

	### 2. Launch the simulator
	```bash
	ros2 launch my_bot_pkg sim.launch.py
	```

	### 3. Run training
	```bash
	python train.py experiment.mode=target experiment.total_timesteps=500000
	```

	### 4. Load and run inference
	```python
	from stable_baselines3 import SAC
	from rc_car_envs_camera import RcCarTargetEnv

	env = RcCarTargetEnv()
	model = SAC.load("sac_target_camera_final", env=env)

	obs, _ = env.reset()
	while True:
	action, _ = model.predict(obs, deterministic=True)
	obs, reward, terminated, truncated, info = env.step(action)
	if terminated or truncated:
	obs, _ = env.reset()
	```

	---

	## Project Structure

	```
	├── rc_car_envs_camera.py # Gym environments (Base, Target, Complex)
	├── train.py # Hydra-based training entry point
	├── configs/
	│ └── config.yaml # Hydra config (mode, timesteps, wandb, etc.)
	└── models/ # Saved checkpoints (W&B)
	```

	---

	## Limitations & Known Issues

	- Training requires a live ROS 2 + Gazebo session; no offline/headless mode currently.
	- `DummyVecEnv` runs a single environment — parallelisation would require `SubprocVecEnv` with careful ROS node naming.
	- Camera latency under heavy load may cause the `scan_received` / `cam_received` wait loop to time out, potentially delivering stale observations.
	- The collision threshold (0.22 m) is tuned for the specific robot mesh; adjust for different URDF geometries.

	---

	## Citation

	If you use this environment or training code in your research, please cite:

	```bibtex
	@misc{rccar_sac_nav,
	title = {RC Car Autonomous Navigation with SAC (Camera + LiDAR)},
	year = {2025},
	url = {https://huggingface.co/Hajorda/SAC_Complex_Camera}
	}
	```

	---

	## License

	MIT License