| # FlowMo Paper Experiment Matrix |
|
|
| This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths. |
|
|
| ## Shared Data And Observation Protocol |
|
|
| All learned world models use the same simulator data and clean-image observation pipeline. |
|
|
| ```text |
| Image input: clean top-down RGB boat image |
| Image size: 160 x 160 |
| Visual scale: 2.5 |
| Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker |
| Train split: data/paper/train.npz |
| Test split: data/paper/test.npz |
| Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier |
| Config: experiments/shared/config/paper_image.json |
| Checkpoint: paper.pt |
| Intermediate checkpoints: paper_step_XXXXXX.pt |
| ``` |
|
|
| All flow fields are static. Localized flow structures are sampled near the |
| route corridors used by the training controllers and final planning tasks. |
|
|
| Formal training budget: |
|
|
| ```text |
| train_episodes: 2400 |
| test_episodes: 480 |
| train_windows: 393216 |
| test_windows: 24576 |
| batch_size: 256 |
| steps: 20000 |
| checkpoint_interval: 2000 |
| num_workers: 4 |
| render_mode: device |
| training_parallel_jobs: 2 |
| planning_parallel_jobs: 3 |
| ``` |
|
|
| Precision policy: |
|
|
| ```text |
| training: bf16 model autocast, fp32 losses and metrics |
| prediction_eval: bf16 model autocast, fp32 metrics |
| planning_eval: fp32 |
| ``` |
|
|
| ## A. Learned World-Model Comparison |
|
|
| Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum. |
|
|
| | Method | Comparison Role | What It Tests | |
| |---|---|---| |
| | `flowmo` | Proposed WM | Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition. | |
| | `leworldmodel` | JEPA-style WM baseline | Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow. | |
| | `planet` | RSSM WM baseline | Whether generic recurrent latent memory can represent momentum and drift without a separate context factor. | |
| | `tdmpc2` | Compact latent-dynamics WM baseline | Whether a compact action-conditioned latent transition matches FlowMo under equal supervision. | |
|
|
| Prediction dataset: |
|
|
| ```text |
| test |
| ``` |
|
|
| Prediction metrics: |
|
|
| ```text |
| pos@1, pos@5, pos@10, pos@20, pos@40, pos@60 |
| heading@20, heading@60 |
| zero-action drift error |
| no-flow momentum decay error |
| same-action different-flow error |
| ``` |
|
|
| FlowMo context diagnostics: |
|
|
| | Diagnostic | Operation | Evidence Sought | |
| |---|---|---| |
| | Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. | |
| | Zero context | Set `c_t=0` | Degraded flow prediction, smaller change in no-flow. | |
| | Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. | |
| | Same-flow transfer | Use context from another episode with the same hidden flow | Better than wrong-flow context transfer. | |
| | No-flow context norm | Measure `||c_t||` on no-flow data | Smaller than flow context norm. | |
| | Context PCA | Plot `c_t` by flow family / flow id | Flow-related organization. | |
|
|
| FlowMo latent probes: |
|
|
| | Probe Target | Feature Sets | Purpose | |
| |---|---|---| |
| | Object momentum `(vx, vy, omega)` | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether short-history state contains object motion. | |
| | Local flow vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether state plus context exposes local ambient drift. | |
| | Episode drift vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether long context contains environment-level drift. | |
|
|
| ## B. Traditional Non-WM Control Comparison |
|
|
| Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control. |
|
|
| Learned WM planners: |
|
|
| ```text |
| flowmo |
| leworldmodel |
| planet |
| tdmpc2 |
| ``` |
|
|
| Traditional non-WM controllers: |
|
|
| | Method | Comparison Role | What It Tests | |
| |---|---|---| |
| | `pid_los_controller` | Simple classical controller | Baseline waypoint tracking without learned dynamics. | |
| | `no_flow_los_controller` | No-flow LOS controller | Effect of ignoring hidden current in a geometric controller. | |
| | `current_estimator_los_controller` | Current-estimator LOS controller | Strength of a hand-designed drift estimator in a geometric controller. | |
| | `oracle_flow_los_controller` | Oracle-flow LOS controller | Effect of true local flow feed-forward in a geometric controller. | |
|
|
| Planning tasks: |
|
|
| ```text |
| reach_target |
| station_keeping |
| waypoint_square |
| waypoint_zigzag |
| ``` |
|
|
| Boats: |
|
|
| ```text |
| twin |
| triangle |
| ``` |
|
|
| Flow families: |
|
|
| ```text |
| noflow |
| uniform |
| vortex_center |
| double_gyre |
| source_sink |
| source_sink_pair |
| gradient |
| shear |
| turbulent_patch |
| random_fourier |
| ``` |
|
|
| Planning metrics: |
|
|
| ```text |
| success rate |
| final distance |
| trajectory length over successful episodes |
| control effort (`sum_t ||a_t||_2^2`) over successful episodes |
| time to goal over successful episodes |
| ``` |
|
|
| Formal commands: |
|
|
| ```bash |
| python -m experiments.run_paper_image_pipeline --stages train |
| python -m experiments.run_paper_image_pipeline --stages prediction |
| python -m experiments.run_paper_image_pipeline --stages probe |
| python -m experiments.run_paper_image_pipeline --stages planning |
| python -m experiments.run_paper_image_pipeline --stages report |
| ``` |
|
|