FlowMo Paper Experiment Matrix
This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths.
Shared Data And Observation Protocol
All learned world models use the same simulator data and clean-image observation pipeline.
Image input: clean top-down RGB boat image
Image size: 160 x 160
Visual scale: 2.5
Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker
Train split: data/paper/train.npz
Test split: data/paper/test.npz
Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier
Config: experiments/shared/config/paper_image.json
Checkpoint: paper.pt
Intermediate checkpoints: paper_step_XXXXXX.pt
All flow fields are static. Localized flow structures are sampled near the route corridors used by the training controllers and final planning tasks.
Formal training budget:
train_episodes: 2400
test_episodes: 480
train_windows: 393216
test_windows: 24576
batch_size: 256
steps: 20000
checkpoint_interval: 2000
num_workers: 4
render_mode: device
training_parallel_jobs: 2
planning_parallel_jobs: 3
Precision policy:
training: bf16 model autocast, fp32 losses and metrics
prediction_eval: bf16 model autocast, fp32 metrics
planning_eval: fp32
A. Learned World-Model Comparison
Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum.
| Method | Comparison Role | What It Tests |
|---|---|---|
flowmo |
Proposed WM | Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition. |
leworldmodel |
JEPA-style WM baseline | Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow. |
planet |
RSSM WM baseline | Whether generic recurrent latent memory can represent momentum and drift without a separate context factor. |
tdmpc2 |
Compact latent-dynamics WM baseline | Whether a compact action-conditioned latent transition matches FlowMo under equal supervision. |
Prediction dataset:
test
Prediction metrics:
pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
heading@20, heading@60
zero-action drift error
no-flow momentum decay error
same-action different-flow error
FlowMo context diagnostics:
| Diagnostic | Operation | Evidence Sought |
|---|---|---|
| Inferred context | Normal rollout with inferred c_t |
Best prediction under flow. |
| Zero context | Set c_t=0 |
Degraded flow prediction, smaller change in no-flow. |
| Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. |
| Same-flow transfer | Use context from another episode with the same hidden flow | Better than wrong-flow context transfer. |
| No-flow context norm | Measure ` | |
| Context PCA | Plot c_t by flow family / flow id |
Flow-related organization. |
FlowMo latent probes:
| Probe Target | Feature Sets | Purpose |
|---|---|---|
Object momentum (vx, vy, omega) |
z_t, c_t, [z_t,c_t] |
Tests whether short-history state contains object motion. |
| Local flow vector | z_t, c_t, [z_t,c_t] |
Tests whether state plus context exposes local ambient drift. |
| Episode drift vector | z_t, c_t, [z_t,c_t] |
Tests whether long context contains environment-level drift. |
B. Traditional Non-WM Control Comparison
Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control.
Learned WM planners:
flowmo
leworldmodel
planet
tdmpc2
Traditional non-WM controllers:
| Method | Comparison Role | What It Tests |
|---|---|---|
pid_los_controller |
Simple classical controller | Baseline waypoint tracking without learned dynamics. |
no_flow_los_controller |
No-flow LOS controller | Effect of ignoring hidden current in a geometric controller. |
current_estimator_los_controller |
Current-estimator LOS controller | Strength of a hand-designed drift estimator in a geometric controller. |
oracle_flow_los_controller |
Oracle-flow LOS controller | Effect of true local flow feed-forward in a geometric controller. |
Planning tasks:
reach_target
station_keeping
waypoint_square
waypoint_zigzag
Boats:
twin
triangle
Flow families:
noflow
uniform
vortex_center
double_gyre
source_sink
source_sink_pair
gradient
shear
turbulent_patch
random_fourier
Planning metrics:
success rate
final distance
trajectory length over successful episodes
control effort (`sum_t ||a_t||_2^2`) over successful episodes
time to goal over successful episodes
Formal commands:
python -m experiments.run_paper_image_pipeline --stages train
python -m experiments.run_paper_image_pipeline --stages prediction
python -m experiments.run_paper_image_pipeline --stages probe
python -m experiments.run_paper_image_pipeline --stages planning
python -m experiments.run_paper_image_pipeline --stages report