| # FlowMo Experiment Protocol |
|
|
| This document is the single paper-facing record for the public FlowMo experiments. Regenerated artifacts should replace the same paths instead of introducing version suffixes. |
|
|
| ## Scope |
|
|
| The project has two formal comparison groups. |
|
|
| ### A. Learned World Models |
|
|
| Purpose: compare image-input world-model architectures under the same simulator data, optimizer budget, rollout target, parameter scale, and planning interface. |
|
|
| | Directory | Report Name | Architecture | Comparison Purpose | |
| |---|---|---|---| |
| | `flowmo` | FlowMo | Shared image encoder; short object-motion state encoder; long strided ambient-drift context encoder; base transition plus zero-context residual | Proposed flow-momentum factorization. Tests whether separating endogenous motion from exogenous drift improves prediction and planning. | |
| | `leworldmodel` | LeWorldModel | JEPA-style image-latent predictor with action-conditioned residual transition | Tests whether simple current-image latent prediction is sufficient. | |
| | `planet` | RSSM | Recurrent state-space latent model with deterministic memory and stochastic latent state | Tests whether generic recurrent memory can absorb momentum and drift without a separate context factor. | |
| | `tdmpc2` | TD-MPC2 Dynamics | Compact action-conditioned latent dynamics with shared image encoder and rollout heads | Tests task-oriented latent dynamics under equal supervision. | |
|
|
| All learned methods receive clean top-down RGB boat images and action history. They do not receive flow labels, flow arrows, velocity vectors, trajectory overlays, or goal markers in the image. |
|
|
| ### B. Traditional Non-WM Controllers |
|
|
| Purpose: compare downstream behavior against non-neural controllers that do not train a world model. |
|
|
| | Directory | Report Name | Input | Comparison Purpose | |
| |---|---|---|---| |
| | `pid_los_controller` | PID/LOS | Clean image pose estimate | Simple hand-designed waypoint tracking baseline. | |
| | `no_flow_los_controller` | No-Flow LOS Controller | Clean image pose estimate | Measures the cost of ignoring ambient current. | |
| | `current_estimator_los_controller` | Current-Estimator LOS Controller | Clean image pose estimate and recent drift | Strong classical current-compensation baseline. | |
| | `oracle_flow_los_controller` | Oracle-Flow LOS Controller | Clean image pose estimate and simulator local flow | True-local-flow feed-forward reference for a simple geometric controller, not a full dynamics-MPC upper bound. | |
|
|
| ## Data |
|
|
| All methods use the same splits: |
|
|
| ```text |
| train: data/paper/train.npz |
| test: data/paper/test.npz |
| dataset_card: data/paper/dataset_card.md |
| generation_config: data/paper/generation_config.json |
| ``` |
|
|
| The train split, test split, and final planning evaluation use the same paper |
| flow-family set: `noflow`, `uniform`, `vortex_center`, `double_gyre`, |
| `source_sink`, `source_sink_pair`, `gradient`, `shear`, `turbulent_patch`, and |
| `random_fourier`. |
|
|
| All paper flow fields are static. Localized structures are sampled near common |
| task routes and waypoint corridors so that non-uniform flow is encountered by |
| the boat during both training trajectories and final planning tasks. |
|
|
| Observation protocol: |
|
|
| ```text |
| image_size: 160 x 160 |
| visual_scale: 2.5 |
| rendering: online clean top-down RGB images |
| forbidden cues: flow arrows, velocity vectors, trajectory overlays, goal markers |
| ``` |
|
|
| Training budget: |
|
|
| ```text |
| train_episodes: 2400 |
| test_episodes: 480 |
| train_windows: 393216 |
| test_windows: 24576 |
| batch_size: 256 |
| steps: 20000 |
| checkpoint_interval: 2000 |
| num_workers: 4 |
| render_mode: device |
| training_parallel_jobs: 2 |
| planning_parallel_jobs: 3 |
| ``` |
|
|
| Precision policy: |
|
|
| ```text |
| training: bf16 model autocast, fp32 losses and metrics |
| prediction_eval: bf16 model autocast, fp32 metrics |
| planning_eval: fp32 |
| ``` |
|
|
| The precision split is intentional: BF16 speeds up image encoding and latent rollout on the RTX 5090 without measurable short-run loss drift, while CEM planning is dominated by small control tensors and did not improve under BF16. |
|
|
| ## Prediction Evaluation |
|
|
| Dataset: |
|
|
| ```text |
| test |
| ``` |
|
|
| Metrics: |
|
|
| ```text |
| pos@1, pos@5, pos@10, pos@20, pos@40, pos@60 |
| heading@20, heading@60 |
| zero-action drift prediction error |
| no-flow momentum decay prediction error |
| same-action different-flow prediction error |
| ``` |
|
|
| FlowMo-only context diagnostics: |
|
|
| | Diagnostic | Operation | Evidence Sought | |
| |---|---|---| |
| | Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. | |
| | Zero context | Set `c_t=0` | Degraded flow prediction and limited change in no-flow. | |
| | Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. | |
| | Same-flow transfer | Use context from another episode with the same hidden flow | Better transfer than wrong-flow context. | |
| | Context norm | Compare no-flow and flow `||c_t||` | Flow context should be larger than no-flow context. | |
|
|
| FlowMo latent probes: |
|
|
| ```text |
| Train frozen linear probes from z_t, c_t, and [z_t,c_t]. |
| Targets: object momentum (vx, vy, omega), local flow vector, episode drift vector. |
| Purpose: verify which latent carries object-motion information and which latent carries ambient-drift information. |
| ``` |
|
|
| ## Planning Evaluation |
|
|
| Learned WM planners: |
|
|
| ```text |
| flowmo |
| leworldmodel |
| planet |
| tdmpc2 |
| ``` |
|
|
| All learned world models use the same route-aware CEM planner over their latent rollouts. |
|
|
| Traditional non-WM controllers: |
|
|
| ```text |
| pid_los_controller |
| no_flow_los_controller |
| current_estimator_los_controller |
| oracle_flow_los_controller |
| ``` |
|
|
| Tasks: |
|
|
| ```text |
| reach_target |
| station_keeping |
| waypoint_square |
| waypoint_zigzag |
| ``` |
|
|
| Boats: |
|
|
| ```text |
| twin |
| triangle |
| ``` |
|
|
| Flow families: |
|
|
| ```text |
| noflow |
| uniform |
| vortex_center |
| double_gyre |
| source_sink |
| source_sink_pair |
| gradient |
| shear |
| turbulent_patch |
| random_fourier |
| ``` |
|
|
| Metrics: |
|
|
| ```text |
| success rate |
| final distance |
| trajectory length over successful episodes |
| control effort (`sum_t ||a_t||_2^2`) over successful episodes |
| time to goal over successful episodes |
| ``` |
|
|
| ## Required Outputs |
|
|
| Training outputs: |
|
|
| ```text |
| experiments/<method>/checkpoint/paper.pt |
| experiments/<method>/checkpoint/paper_step_*.pt |
| experiments/<method>/result/parameter_count.json |
| experiments/<method>/result/paper_training.json |
| experiments/<method>/result/paper_training_trace.jsonl |
| ``` |
|
|
| Evaluation outputs: |
|
|
| ```text |
| experiments/reports/paper_prediction.json |
| experiments/reports/paper_flowmo_latent_probes.json |
| experiments/reports/paper_planning/*.json |
| experiments/reports/paper_planning/gifs/*.gif |
| experiments/reports/paper_report.md |
| ``` |
|
|
| ## Commands |
|
|
| Run the complete paper pipeline: |
|
|
| ```bash |
| python -m experiments.run_paper_image_pipeline |
| ``` |
|
|
| Run stages separately: |
|
|
| ```bash |
| python -m experiments.run_paper_image_pipeline --stages train |
| python -m experiments.run_paper_image_pipeline --stages prediction |
| python -m experiments.run_paper_image_pipeline --stages probe |
| python -m experiments.run_paper_image_pipeline --stages planning |
| python -m experiments.run_paper_image_pipeline --stages report |
| ``` |
|
|
| Run tests: |
|
|
| ```bash |
| python -m pytest -q |
| ``` |
|
|