| # FlowMo Experiment Protocol |
|
|
| This document is the single paper-facing record for the public FlowMo experiments. Regenerated artifacts should replace the same paths instead of introducing version suffixes. |
|
|
| ## Scope |
|
|
| The project has two formal comparison groups. |
|
|
| ### A. Learned World Models |
|
|
| Purpose: compare image-input world-model architectures under the same simulator data, optimizer budget, rollout target, parameter scale, and planning interface. |
|
|
| | Directory | Report Name | Architecture | Comparison Purpose | |
| |---|---|---|---| |
| | `flowmo` | FlowMo | Shared image encoder; short object-motion state encoder; long strided ambient-drift context encoder; base transition plus zero-context residual | Proposed flow-momentum factorization. Tests whether separating endogenous motion from exogenous drift improves prediction and planning. | |
| | `leworldmodel` | LeWorldModel | JEPA-style image-latent predictor with action-conditioned residual transition | Tests whether simple current-image latent prediction is sufficient. | |
| | `planet` | RSSM | Recurrent state-space latent model with deterministic memory and stochastic latent state | Tests whether generic recurrent memory can absorb momentum and drift without a separate context factor. | |
| | `tdmpc2` | TD-MPC2 Dynamics | Compact action-conditioned latent dynamics with shared image encoder and rollout heads | Tests task-oriented latent dynamics under equal supervision. | |
|
|
| All learned methods receive clean top-down RGB boat images and action history. They do not receive flow labels, flow arrows, velocity vectors, trajectory overlays, or goal markers in the image. |
|
|
| ### B. Traditional Non-WM Controllers |
|
|
| Purpose: compare downstream behavior against non-neural controllers that do not train a world model. |
|
|
| | Directory | Report Name | Input | Comparison Purpose | |
| |---|---|---|---| |
| | `pid_los_controller` | PID/LOS | Clean image pose estimate | Simple hand-designed waypoint tracking baseline. | |
| | `physics_mpc_no_flow` | Physics MPC No-Flow | Clean image pose estimate | Measures the cost of ignoring ambient current. | |
| | `current_estimator_mpc` | Current-Estimator MPC | Clean image pose estimate and recent drift | Strong classical current-compensation baseline. | |
| | `oracle_flow_mpc` | Oracle-Flow MPC | Clean image pose estimate and simulator local flow | Reference bound for control when true local flow is available. | |
|
|
| ## Data |
|
|
| All methods use the same splits: |
|
|
| ```text |
| train: data/paper/train.npz |
| unseen_flow_test: data/paper/test_unseen_flow.npz |
| unseen_boat_dynamics_test: data/paper/test_unseen_boat_params.npz |
| seen_flow_diagnostic: data/paper/diagnostic_seen_flow.npz |
| dataset_card: data/paper/dataset_card.md |
| generation_config: data/paper/generation_config.json |
| ``` |
|
|
| Observation protocol: |
|
|
| ```text |
| image_size: 160 x 160 |
| visual_scale: 2.5 |
| rendering: online clean top-down RGB images |
| forbidden cues: flow arrows, velocity vectors, trajectory overlays, goal markers |
| ``` |
|
|
| Training budget: |
|
|
| ```text |
| train_episodes: 2400 |
| test_episodes: 480 |
| train_windows: 393216 |
| test_windows: 24576 |
| batch_size: 256 |
| steps: 20000 |
| checkpoint_interval: 2000 |
| num_workers: 4 |
| render_mode: device |
| ``` |
|
|
| Precision policy: |
|
|
| ```text |
| training: bf16 model autocast, fp32 losses and metrics |
| prediction_eval: bf16 model autocast, fp32 metrics |
| planning_eval: fp32 |
| ``` |
|
|
| The precision split is intentional: BF16 speeds up image encoding and latent rollout on the RTX 5090 without measurable short-run loss drift, while CEM planning is dominated by small control tensors and did not improve under BF16. |
|
|
| ## Prediction Evaluation |
|
|
| Datasets: |
|
|
| ```text |
| test_unseen_flow |
| test_unseen_boat_params |
| diagnostic_seen_flow |
| ``` |
|
|
| Metrics: |
|
|
| ```text |
| pos@1, pos@5, pos@10, pos@20, pos@40, pos@60 |
| heading@20, heading@60 |
| zero-action drift prediction error |
| no-flow momentum decay prediction error |
| same-action different-flow prediction error |
| ``` |
|
|
| FlowMo-only context diagnostics: |
|
|
| | Diagnostic | Operation | Evidence Sought | |
| |---|---|---| |
| | Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. | |
| | Zero context | Set `c_t=0` | Degraded flow prediction and limited change in no-flow. | |
| | Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. | |
| | Same-flow transfer | Use context from another episode with the same hidden flow | Better transfer than wrong-flow context. | |
| | Context norm | Compare no-flow and flow `||c_t||` | Flow context should be larger than no-flow context. | |
|
|
| FlowMo latent probes: |
|
|
| ```text |
| Train frozen linear probes from z_t, c_t, and [z_t,c_t]. |
| Targets: object momentum (vx, vy, omega), local flow vector, episode drift vector. |
| Purpose: verify which latent carries object-motion information and which latent carries ambient-drift information. |
| ``` |
|
|
| ## Planning Evaluation |
|
|
| Learned WM planners: |
|
|
| ```text |
| flowmo |
| leworldmodel |
| planet |
| tdmpc2 |
| ``` |
|
|
| All learned world models use the same route-aware CEM planner over their latent rollouts. |
|
|
| Traditional non-WM controllers: |
|
|
| ```text |
| pid_los_controller |
| physics_mpc_no_flow |
| current_estimator_mpc |
| oracle_flow_mpc |
| ``` |
|
|
| Tasks: |
|
|
| ```text |
| reach_uniform |
| counterflow |
| station_keeping |
| passive_to_active |
| waypoint_square |
| waypoint_zigzag |
| ``` |
|
|
| Boats: |
|
|
| ```text |
| twin |
| triangle |
| ``` |
|
|
| Metrics: |
|
|
| ```text |
| success rate |
| final distance |
| trajectory length over successful episodes |
| energy / thrust work over successful episodes |
| time to goal over successful episodes |
| ``` |
|
|
| ## Required Outputs |
|
|
| Training outputs: |
|
|
| ```text |
| experiments/<method>/checkpoint/paper.pt |
| experiments/<method>/checkpoint/paper_step_*.pt |
| experiments/<method>/result/parameter_count.json |
| experiments/<method>/result/paper_training.json |
| experiments/<method>/result/paper_training_trace.jsonl |
| ``` |
|
|
| Evaluation outputs: |
|
|
| ```text |
| experiments/reports/paper_prediction_unseen_flow.json |
| experiments/reports/paper_prediction_unseen_boat_params.json |
| experiments/reports/paper_prediction_seen_flow_diagnostic.json |
| experiments/reports/paper_flowmo_latent_probes.json |
| experiments/reports/paper_planning/*.json |
| experiments/reports/paper_planning/gifs/*.gif |
| experiments/reports/paper_report.md |
| ``` |
|
|
| ## Commands |
|
|
| Run the complete paper pipeline: |
|
|
| ```bash |
| python -m experiments.run_paper_image_pipeline |
| ``` |
|
|
| Run stages separately: |
|
|
| ```bash |
| python -m experiments.run_paper_image_pipeline --stages train |
| python -m experiments.run_paper_image_pipeline --stages prediction |
| python -m experiments.run_paper_image_pipeline --stages probe |
| python -m experiments.run_paper_image_pipeline --stages planning |
| python -m experiments.run_paper_image_pipeline --stages report |
| ``` |
|
|
| Run tests: |
|
|
| ```bash |
| python -m pytest -q |
| ``` |
|
|