WeNavigate PPO Low-Level Controller

Low-level PPO controller for the WeNavigate Vision-Language Navigation system. Trained to execute VLM navigation commands (forward / turn-left / turn-right / stop) inside Facebook Habitat-sim with HM3D scenes.

Architecture

  • Policy: Actor-Critic CNN + MLP (~144K parameters)
  • Encoder: 3-layer stride-2 CNN (64×64 depth → 128-dim embedding)
  • Observation: depth image (64×64) + VLM command one-hot (4-dim) + proprioception (3-dim)
  • Action space: Discrete 5 (forward / left / right / stop / no-op)

Training

  • Algorithm: PPO with GAE-λ
  • Steps trained: 1,998,848
  • Final intent-following rate: 98.3%
  • Reward: R_INTENT=+2.5 (follow VLM), R_INTENT_MISS=-0.5 (diverge), R_COLLISION=-10

Hyperparameters

Parameter Value
n_rollout 2048
n_epochs 4
batch_size 256
lr 0.0003
gamma 0.99
gae_lambda 0.95
clip_eps 0.2
entropy_coef 0.01

Usage

import torch
from ppo_policy import PPOPolicy

policy = PPOPolicy()
ckpt   = torch.load("policy_update_XXXXX.pt", map_location="cpu")
policy.load_state_dict(ckpt["policy_state"])
policy.eval()

# obs: dict with keys depth (64,64), command (4,), proprioception (3,)
action, log_prob, entropy, value = policy.get_action_and_value(
    depth.unsqueeze(0),
    command.unsqueeze(0),
    prop.unsqueeze(0),
)

Dataset

Trained on wenavigatecontroller-long-episodes — HM3D minival scenes 00800–00809, 160 train episodes, 160 eval episodes.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading