| --- |
| language: en |
| tags: |
| - reinforcement-learning |
| - navigation |
| - habitat-sim |
| - ppo |
| - indoor-navigation |
| - wenavigatecontroller |
| license: mit |
| --- |
| |
| # WeNavigate PPO Low-Level Controller |
|
|
| Low-level PPO controller for the [WeNavigate](https://github.com/vshwanilgv/ppo-controller) |
| Vision-Language Navigation system. Trained to execute VLM navigation commands |
| (forward / turn-left / turn-right / stop) inside Facebook Habitat-sim with HM3D scenes. |
|
|
| ## Architecture |
|
|
| - **Policy**: Actor-Critic CNN + MLP (~144K parameters) |
| - **Encoder**: 3-layer stride-2 CNN (64×64 depth → 128-dim embedding) |
| - **Observation**: depth image (64×64) + VLM command one-hot (4-dim) + proprioception (3-dim) |
| - **Action space**: Discrete 5 (forward / left / right / stop / no-op) |
|
|
| ## Training |
|
|
| - **Algorithm**: PPO with GAE-λ |
| - **Steps trained**: 1,998,848 |
| - **Final intent-following rate**: 98.3% |
| - **Reward**: R_INTENT=+2.5 (follow VLM), R_INTENT_MISS=-0.5 (diverge), R_COLLISION=-10 |
|
|
| ### Hyperparameters |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | n_rollout | 2048 | |
| | n_epochs | 4 | |
| | batch_size | 256 | |
| | lr | 0.0003 | |
| | gamma | 0.99 | |
| | gae_lambda | 0.95 | |
| | clip_eps | 0.2 | |
| | entropy_coef | 0.01 | |
|
|
|
|
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from ppo_policy import PPOPolicy |
| |
| policy = PPOPolicy() |
| ckpt = torch.load("policy_update_XXXXX.pt", map_location="cpu") |
| policy.load_state_dict(ckpt["policy_state"]) |
| policy.eval() |
| |
| # obs: dict with keys depth (64,64), command (4,), proprioception (3,) |
| action, log_prob, entropy, value = policy.get_action_and_value( |
| depth.unsqueeze(0), |
| command.unsqueeze(0), |
| prop.unsqueeze(0), |
| ) |
| ``` |
|
|
| ## Dataset |
|
|
| Trained on [wenavigatecontroller-long-episodes](https://huggingface.co/datasets/vshwanilgv/wenavigatecontroller-long-episodes) |
| — HM3D minival scenes 00800–00809, 160 train episodes, 160 eval episodes. |
|
|