--- language: en tags: - reinforcement-learning - navigation - habitat-sim - ppo - indoor-navigation - wenavigatecontroller license: mit --- # WeNavigate PPO Low-Level Controller Low-level PPO controller for the [WeNavigate](https://github.com/vshwanilgv/ppo-controller) Vision-Language Navigation system. Trained to execute VLM navigation commands (forward / turn-left / turn-right / stop) inside Facebook Habitat-sim with HM3D scenes. ## Architecture - **Policy**: Actor-Critic CNN + MLP (~144K parameters) - **Encoder**: 3-layer stride-2 CNN (64×64 depth → 128-dim embedding) - **Observation**: depth image (64×64) + VLM command one-hot (4-dim) + proprioception (3-dim) - **Action space**: Discrete 5 (forward / left / right / stop / no-op) ## Training - **Algorithm**: PPO with GAE-λ - **Steps trained**: 1,998,848 - **Final intent-following rate**: 98.3% - **Reward**: R_INTENT=+2.5 (follow VLM), R_INTENT_MISS=-0.5 (diverge), R_COLLISION=-10 ### Hyperparameters | Parameter | Value | |-----------|-------| | n_rollout | 2048 | | n_epochs | 4 | | batch_size | 256 | | lr | 0.0003 | | gamma | 0.99 | | gae_lambda | 0.95 | | clip_eps | 0.2 | | entropy_coef | 0.01 | ## Usage ```python import torch from ppo_policy import PPOPolicy policy = PPOPolicy() ckpt = torch.load("policy_update_XXXXX.pt", map_location="cpu") policy.load_state_dict(ckpt["policy_state"]) policy.eval() # obs: dict with keys depth (64,64), command (4,), proprioception (3,) action, log_prob, entropy, value = policy.get_action_and_value( depth.unsqueeze(0), command.unsqueeze(0), prop.unsqueeze(0), ) ``` ## Dataset Trained on [wenavigatecontroller-long-episodes](https://huggingface.co/datasets/vshwanilgv/wenavigatecontroller-long-episodes) — HM3D minival scenes 00800–00809, 160 train episodes, 160 eval episodes.