# FlowMo Paper Experiment Matrix This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths. ## Shared Data And Observation Protocol All learned world models use the same simulator data and clean-image observation pipeline. ```text Image input: clean top-down RGB boat image Image size: 160 x 160 Visual scale: 2.5 Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker Train split: data/paper/train.npz Test split: data/paper/test.npz Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier Config: experiments/shared/config/paper_image.json Checkpoint: paper.pt Intermediate checkpoints: paper_step_XXXXXX.pt ``` All flow fields are static. Localized flow structures are sampled near the route corridors used by the training controllers and final planning tasks. Formal training budget: ```text train_episodes: 2400 test_episodes: 480 train_windows: 393216 test_windows: 24576 batch_size: 256 steps: 20000 checkpoint_interval: 2000 num_workers: 4 render_mode: device training_parallel_jobs: 2 planning_parallel_jobs: 3 ``` Precision policy: ```text training: bf16 model autocast, fp32 losses and metrics prediction_eval: bf16 model autocast, fp32 metrics planning_eval: fp32 ``` ## A. Learned World-Model Comparison Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum. | Method | Comparison Role | What It Tests | |---|---|---| | `flowmo` | Proposed WM | Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition. | | `leworldmodel` | JEPA-style WM baseline | Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow. | | `planet` | RSSM WM baseline | Whether generic recurrent latent memory can represent momentum and drift without a separate context factor. | | `tdmpc2` | Compact latent-dynamics WM baseline | Whether a compact action-conditioned latent transition matches FlowMo under equal supervision. | Prediction dataset: ```text test ``` Prediction metrics: ```text pos@1, pos@5, pos@10, pos@20, pos@40, pos@60 heading@20, heading@60 zero-action drift error no-flow momentum decay error same-action different-flow error ``` FlowMo context diagnostics: | Diagnostic | Operation | Evidence Sought | |---|---|---| | Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. | | Zero context | Set `c_t=0` | Degraded flow prediction, smaller change in no-flow. | | Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. | | Same-flow transfer | Use context from another episode with the same hidden flow | Better than wrong-flow context transfer. | | No-flow context norm | Measure `||c_t||` on no-flow data | Smaller than flow context norm. | | Context PCA | Plot `c_t` by flow family / flow id | Flow-related organization. | FlowMo latent probes: | Probe Target | Feature Sets | Purpose | |---|---|---| | Object momentum `(vx, vy, omega)` | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether short-history state contains object motion. | | Local flow vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether state plus context exposes local ambient drift. | | Episode drift vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether long context contains environment-level drift. | ## B. Traditional Non-WM Control Comparison Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control. Learned WM planners: ```text flowmo leworldmodel planet tdmpc2 ``` Traditional non-WM controllers: | Method | Comparison Role | What It Tests | |---|---|---| | `pid_los_controller` | Simple classical controller | Baseline waypoint tracking without learned dynamics. | | `no_flow_los_controller` | No-flow LOS controller | Effect of ignoring hidden current in a geometric controller. | | `current_estimator_los_controller` | Current-estimator LOS controller | Strength of a hand-designed drift estimator in a geometric controller. | | `oracle_flow_los_controller` | Oracle-flow LOS controller | Effect of true local flow feed-forward in a geometric controller. | Planning tasks: ```text reach_target station_keeping waypoint_square waypoint_zigzag ``` Boats: ```text twin triangle ``` Flow families: ```text noflow uniform vortex_center double_gyre source_sink source_sink_pair gradient shear turbulent_patch random_fourier ``` Planning metrics: ```text success rate final distance trajectory length over successful episodes control effort (`sum_t ||a_t||_2^2`) over successful episodes time to goal over successful episodes ``` Formal commands: ```bash python -m experiments.run_paper_image_pipeline --stages train python -m experiments.run_paper_image_pipeline --stages prediction python -m experiments.run_paper_image_pipeline --stages probe python -m experiments.run_paper_image_pipeline --stages planning python -m experiments.run_paper_image_pipeline --stages report ```