FlowMo Paper Experiment Matrix

This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths.

Shared Data And Observation Protocol

All learned world models use the same simulator data and clean-image observation pipeline.

Image input: clean top-down RGB boat image
Image size: 160 x 160
Visual scale: 2.5
Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker
Train split: data/paper/train.npz
Test split: data/paper/test.npz
Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier
Config: experiments/shared/config/paper_image.json
Checkpoint: paper.pt
Intermediate checkpoints: paper_step_XXXXXX.pt

All flow fields are static. Localized flow structures are sampled near the route corridors used by the training controllers and final planning tasks.

Formal training budget:

train_episodes: 2400
test_episodes: 480
train_windows: 393216
test_windows: 24576
batch_size: 256
steps: 20000
checkpoint_interval: 2000
num_workers: 4
render_mode: device
training_parallel_jobs: 2
planning_parallel_jobs: 3

Precision policy:

training: bf16 model autocast, fp32 losses and metrics
prediction_eval: bf16 model autocast, fp32 metrics
planning_eval: fp32

A. Learned World-Model Comparison

Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum.

Method	Comparison Role	What It Tests
`flowmo`	Proposed WM	Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition.
`leworldmodel`	JEPA-style WM baseline	Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow.
`planet`	RSSM WM baseline	Whether generic recurrent latent memory can represent momentum and drift without a separate context factor.
`tdmpc2`	Compact latent-dynamics WM baseline	Whether a compact action-conditioned latent transition matches FlowMo under equal supervision.

Prediction dataset:

test

Prediction metrics:

pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
heading@20, heading@60
zero-action drift error
no-flow momentum decay error
same-action different-flow error

FlowMo context diagnostics:

Diagnostic	Operation	Evidence Sought
Inferred context	Normal rollout with inferred `c_t`	Best prediction under flow.
Zero context	Set `c_t=0`	Degraded flow prediction, smaller change in no-flow.
Shuffled context	Use context from another episode	Worse rollout when hidden flow differs.
Same-flow transfer	Use context from another episode with the same hidden flow	Better than wrong-flow context transfer.
No-flow context norm	Measure `
Context PCA	Plot `c_t` by flow family / flow id	Flow-related organization.

FlowMo latent probes:

Probe Target	Feature Sets	Purpose
Object momentum `(vx, vy, omega)`	`z_t`, `c_t`, `[z_t,c_t]`	Tests whether short-history state contains object motion.
Local flow vector	`z_t`, `c_t`, `[z_t,c_t]`	Tests whether state plus context exposes local ambient drift.
Episode drift vector	`z_t`, `c_t`, `[z_t,c_t]`	Tests whether long context contains environment-level drift.

B. Traditional Non-WM Control Comparison

Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control.

Learned WM planners:

flowmo
leworldmodel
planet
tdmpc2

Traditional non-WM controllers:

Method	Comparison Role	What It Tests
`pid_los_controller`	Simple classical controller	Baseline waypoint tracking without learned dynamics.
`no_flow_los_controller`	No-flow LOS controller	Effect of ignoring hidden current in a geometric controller.
`current_estimator_los_controller`	Current-estimator LOS controller	Strength of a hand-designed drift estimator in a geometric controller.
`oracle_flow_los_controller`	Oracle-flow LOS controller	Effect of true local flow feed-forward in a geometric controller.

Planning tasks:

reach_target
station_keeping
waypoint_square
waypoint_zigzag

Boats:

twin
triangle

Flow families:

noflow
uniform
vortex_center
double_gyre
source_sink
source_sink_pair
gradient
shear
turbulent_patch
random_fourier

Planning metrics:

success rate
final distance
trajectory length over successful episodes
control effort (`sum_t ||a_t||_2^2`) over successful episodes
time to goal over successful episodes

Formal commands:

python -m experiments.run_paper_image_pipeline --stages train
python -m experiments.run_paper_image_pipeline --stages prediction
python -m experiments.run_paper_image_pipeline --stages probe
python -m experiments.run_paper_image_pipeline --stages planning
python -m experiments.run_paper_image_pipeline --stages report