vshwanilgv
/

wenavigatecontroller-ppo

Reinforcement Learning

indoor-navigation

wenavigatecontroller

Model card Files Files and versions

wenavigatecontroller-ppo / README.md

vshwanilgv's picture

Upload README.md with huggingface_hub

248d47f verified about 1 month ago

|

history blame contribute delete

1.84 kB

	---
	language: en
	tags:
	- reinforcement-learning
	- navigation
	- habitat-sim
	- ppo
	- indoor-navigation
	- wenavigatecontroller
	license: mit
	---

	# WeNavigate PPO Low-Level Controller

	Low-level PPO controller for the [WeNavigate](https://github.com/vshwanilgv/ppo-controller)
	Vision-Language Navigation system. Trained to execute VLM navigation commands
	(forward / turn-left / turn-right / stop) inside Facebook Habitat-sim with HM3D scenes.

	## Architecture

	- Policy: Actor-Critic CNN + MLP (~144K parameters)
	- Encoder: 3-layer stride-2 CNN (64×64 depth → 128-dim embedding)
	- Observation: depth image (64×64) + VLM command one-hot (4-dim) + proprioception (3-dim)
	- Action space: Discrete 5 (forward / left / right / stop / no-op)

	## Training

	- Algorithm: PPO with GAE-λ
	- Steps trained: 1,998,848
	- Final intent-following rate: 98.3%
	- Reward: R_INTENT=+2.5 (follow VLM), R_INTENT_MISS=-0.5 (diverge), R_COLLISION=-10

	### Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| n_rollout \| 2048 \|
	\| n_epochs \| 4 \|
	\| batch_size \| 256 \|
	\| lr \| 0.0003 \|
	\| gamma \| 0.99 \|
	\| gae_lambda \| 0.95 \|
	\| clip_eps \| 0.2 \|
	\| entropy_coef \| 0.01 \|



	## Usage

	```python
	import torch
	from ppo_policy import PPOPolicy

	policy = PPOPolicy()
	ckpt = torch.load("policy_update_XXXXX.pt", map_location="cpu")
	policy.load_state_dict(ckpt["policy_state"])
	policy.eval()

	# obs: dict with keys depth (64,64), command (4,), proprioception (3,)
	action, log_prob, entropy, value = policy.get_action_and_value(
	depth.unsqueeze(0),
	command.unsqueeze(0),
	prop.unsqueeze(0),
	)
	```

	## Dataset

	Trained on [wenavigatecontroller-long-episodes](https://huggingface.co/datasets/vshwanilgv/wenavigatecontroller-long-episodes)
	— HM3D minival scenes 00800–00809, 160 train episodes, 160 eval episodes.