FlowMo-WM / experiments /EXPERIMENT_MATRIX.md
cccat6's picture
Clean public repository for reproducibility
8e384df verified

FlowMo Paper Experiment Matrix

This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths.

Shared Data And Observation Protocol

All learned world models use the same simulator data and clean-image observation pipeline.

Image input: clean top-down RGB boat image
Image size: 160 x 160
Visual scale: 2.5
Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker
Train split: data/paper/train.npz
Test split: data/paper/test.npz
Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier
Config: experiments/shared/config/paper_image.json
Checkpoint: paper.pt
Intermediate checkpoints: paper_step_XXXXXX.pt

All flow fields are static. Localized flow structures are sampled near the route corridors used by the training controllers and final planning tasks.

Formal training budget:

train_episodes: 2400
test_episodes: 480
train_windows: 393216
test_windows: 24576
batch_size: 256
steps: 20000
checkpoint_interval: 2000
num_workers: 4
render_mode: device
training_parallel_jobs: 2
planning_parallel_jobs: 3

Precision policy:

training: bf16 model autocast, fp32 losses and metrics
prediction_eval: bf16 model autocast, fp32 metrics
planning_eval: fp32

A. Learned World-Model Comparison

Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum.

Method Comparison Role What It Tests
flowmo Proposed WM Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition.
leworldmodel JEPA-style WM baseline Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow.
planet RSSM WM baseline Whether generic recurrent latent memory can represent momentum and drift without a separate context factor.
tdmpc2 Compact latent-dynamics WM baseline Whether a compact action-conditioned latent transition matches FlowMo under equal supervision.

Prediction dataset:

test

Prediction metrics:

pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
heading@20, heading@60
zero-action drift error
no-flow momentum decay error
same-action different-flow error

FlowMo context diagnostics:

Diagnostic Operation Evidence Sought
Inferred context Normal rollout with inferred c_t Best prediction under flow.
Zero context Set c_t=0 Degraded flow prediction, smaller change in no-flow.
Shuffled context Use context from another episode Worse rollout when hidden flow differs.
Same-flow transfer Use context from another episode with the same hidden flow Better than wrong-flow context transfer.
No-flow context norm Measure `
Context PCA Plot c_t by flow family / flow id Flow-related organization.

FlowMo latent probes:

Probe Target Feature Sets Purpose
Object momentum (vx, vy, omega) z_t, c_t, [z_t,c_t] Tests whether short-history state contains object motion.
Local flow vector z_t, c_t, [z_t,c_t] Tests whether state plus context exposes local ambient drift.
Episode drift vector z_t, c_t, [z_t,c_t] Tests whether long context contains environment-level drift.

B. Traditional Non-WM Control Comparison

Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control.

Learned WM planners:

flowmo
leworldmodel
planet
tdmpc2

Traditional non-WM controllers:

Method Comparison Role What It Tests
pid_los_controller Simple classical controller Baseline waypoint tracking without learned dynamics.
no_flow_los_controller No-flow LOS controller Effect of ignoring hidden current in a geometric controller.
current_estimator_los_controller Current-estimator LOS controller Strength of a hand-designed drift estimator in a geometric controller.
oracle_flow_los_controller Oracle-flow LOS controller Effect of true local flow feed-forward in a geometric controller.

Planning tasks:

reach_target
station_keeping
waypoint_square
waypoint_zigzag

Boats:

twin
triangle

Flow families:

noflow
uniform
vortex_center
double_gyre
source_sink
source_sink_pair
gradient
shear
turbulent_patch
random_fourier

Planning metrics:

success rate
final distance
trajectory length over successful episodes
control effort (`sum_t ||a_t||_2^2`) over successful episodes
time to goal over successful episodes

Formal commands:

python -m experiments.run_paper_image_pipeline --stages train
python -m experiments.run_paper_image_pipeline --stages prediction
python -m experiments.run_paper_image_pipeline --stages probe
python -m experiments.run_paper_image_pipeline --stages planning
python -m experiments.run_paper_image_pipeline --stages report