Baseline Scope
The public benchmark has two formal comparison groups.
A. Learned World Models
Purpose: compare image-input world-model architectures under the same data, optimizer budget, rollout target, and planning interface.
| Directory | Report Name | Why It Is Included |
|---|---|---|
flowmo |
FlowMo | Proposed flow-momentum WM. Separates short object motion state from long ambient drift context. |
leworldmodel |
LeWorldModel | Simple JEPA-style latent prediction baseline. Tests whether current-image latent dynamics are enough. |
planet |
PlaNet RSSM | Recurrent state-space baseline. Tests whether generic recurrent memory can absorb momentum and drift. |
tdmpc2 |
TD-MPC2 Dynamics | Compact latent-dynamics baseline. Tests action-conditioned latent rollout with a task-oriented architecture. |
Comparison outputs:
rollout prediction error
heading prediction error
context ablation for FlowMo
planning metrics when the learned WM is used inside the shared planner
B. Traditional Non-WM Controllers
Purpose: compare downstream behavior against hand-designed controllers that do not train a neural world model.
| Directory | Report Name | Why It Is Included |
|---|---|---|
pid_los_controller |
PID/LOS controller | Simple classical waypoint tracking baseline. |
no_flow_los_controller |
No-Flow LOS Controller | Geometric line-of-sight controller that ignores ambient current. |
current_estimator_los_controller |
Current-Estimator LOS Controller | Strong classical baseline that estimates current from recent drift. |
oracle_flow_los_controller |
Oracle-Flow LOS Controller | Geometric line-of-sight controller with true local flow feed-forward. |
Comparison outputs:
success rate
final distance
trajectory length over successful episodes
control effort (`sum_t ||a_t||_2^2`) over successful episodes
time to goal over successful episodes