File size: 5,012 Bytes

604e535

# FlowMo Paper Experiment Matrix

This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths.

## Shared Data And Observation Protocol

All learned world models use the same simulator data and clean-image observation pipeline.

```text
Image input: clean top-down RGB boat image
Image size: 160 x 160
Visual scale: 2.5
Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker
Train split: data/paper/train.npz
Primary unseen-flow split: data/paper/test_unseen_flow.npz
Primary unseen-boat-dynamics split: data/paper/test_unseen_boat_params.npz
Diagnostic seen-flow-family split: data/paper/diagnostic_seen_flow.npz
Config: experiments/shared/config/paper_image.json
Checkpoint: paper.pt
Intermediate checkpoints: paper_step_XXXXXX.pt
```

Formal training budget:

```text
train_episodes: 2400
test_episodes: 480
train_windows: 393216
test_windows: 24576
batch_size: 256
steps: 20000
checkpoint_interval: 2000
num_workers: 4
render_mode: device
```

Precision policy:

```text
training: bf16 model autocast, fp32 losses and metrics
prediction_eval: bf16 model autocast, fp32 metrics
planning_eval: fp32
```

## A. Learned World-Model Comparison

Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum.

| Method | Comparison Role | What It Tests |
|---|---|---|
| `flowmo` | Proposed WM | Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition. |
| `leworldmodel` | JEPA-style WM baseline | Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow. |
| `planet` | RSSM WM baseline | Whether generic recurrent latent memory can represent momentum and drift without a separate context factor. |
| `tdmpc2` | Compact latent-dynamics WM baseline | Whether a compact action-conditioned latent transition matches FlowMo under equal supervision. |

Prediction datasets:

```text
test_unseen_flow
test_unseen_boat_params
diagnostic_seen_flow
```

Prediction metrics:

```text
pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
heading@20, heading@60
zero-action drift error
no-flow momentum decay error
same-action different-flow error
```

FlowMo context diagnostics:

| Diagnostic | Operation | Evidence Sought |
|---|---|---|
| Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. |
| Zero context | Set `c_t=0` | Degraded flow prediction, smaller change in no-flow. |
| Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. |
| Same-flow transfer | Use context from another episode with the same hidden flow | Better than wrong-flow context transfer. |
| No-flow context norm | Measure `||c_t||` on no-flow data | Smaller than flow context norm. |
| Context PCA | Plot `c_t` by flow family / flow id | Flow-related organization. |

FlowMo latent probes:

| Probe Target | Feature Sets | Purpose |
|---|---|---|
| Object momentum `(vx, vy, omega)` | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether short-history state contains object motion. |
| Local flow vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether state plus context exposes local ambient drift. |
| Episode drift vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether long context contains environment-level drift. |

## B. Traditional Non-WM Control Comparison

Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control.

Learned WM planners:

```text
flowmo
leworldmodel
planet
tdmpc2
```

Traditional non-WM controllers:

| Method | Comparison Role | What It Tests |
|---|---|---|
| `pid_los_controller` | Simple classical controller | Baseline waypoint tracking without learned dynamics. |
| `physics_mpc_no_flow` | Nominal physics MPC | Effect of ignoring hidden current. |
| `current_estimator_mpc` | Current-compensated classical MPC | Strength of a hand-designed drift estimator. |
| `oracle_flow_mpc` | Oracle reference | Reference performance when true local flow is available. |

Planning tasks:

```text
reach_uniform
counterflow
station_keeping
passive_to_active
waypoint_square
waypoint_zigzag
```

Boats:

```text
twin
triangle
```

Planning metrics:

```text
success rate
final distance
trajectory length over successful episodes
energy / thrust work over successful episodes
time to goal over successful episodes
```

Formal commands:

```bash
python -m experiments.run_paper_image_pipeline --stages train
python -m experiments.run_paper_image_pipeline --stages prediction
python -m experiments.run_paper_image_pipeline --stages probe
python -m experiments.run_paper_image_pipeline --stages planning
python -m experiments.run_paper_image_pipeline --stages report
```