FlowMo-WM / experiments /EXPERIMENT_MATRIX.md
cccat6's picture
Clean public repository for reproducibility
8e384df verified
# FlowMo Paper Experiment Matrix
This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths.
## Shared Data And Observation Protocol
All learned world models use the same simulator data and clean-image observation pipeline.
```text
Image input: clean top-down RGB boat image
Image size: 160 x 160
Visual scale: 2.5
Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker
Train split: data/paper/train.npz
Test split: data/paper/test.npz
Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier
Config: experiments/shared/config/paper_image.json
Checkpoint: paper.pt
Intermediate checkpoints: paper_step_XXXXXX.pt
```
All flow fields are static. Localized flow structures are sampled near the
route corridors used by the training controllers and final planning tasks.
Formal training budget:
```text
train_episodes: 2400
test_episodes: 480
train_windows: 393216
test_windows: 24576
batch_size: 256
steps: 20000
checkpoint_interval: 2000
num_workers: 4
render_mode: device
training_parallel_jobs: 2
planning_parallel_jobs: 3
```
Precision policy:
```text
training: bf16 model autocast, fp32 losses and metrics
prediction_eval: bf16 model autocast, fp32 metrics
planning_eval: fp32
```
## A. Learned World-Model Comparison
Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum.
| Method | Comparison Role | What It Tests |
|---|---|---|
| `flowmo` | Proposed WM | Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition. |
| `leworldmodel` | JEPA-style WM baseline | Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow. |
| `planet` | RSSM WM baseline | Whether generic recurrent latent memory can represent momentum and drift without a separate context factor. |
| `tdmpc2` | Compact latent-dynamics WM baseline | Whether a compact action-conditioned latent transition matches FlowMo under equal supervision. |
Prediction dataset:
```text
test
```
Prediction metrics:
```text
pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
heading@20, heading@60
zero-action drift error
no-flow momentum decay error
same-action different-flow error
```
FlowMo context diagnostics:
| Diagnostic | Operation | Evidence Sought |
|---|---|---|
| Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. |
| Zero context | Set `c_t=0` | Degraded flow prediction, smaller change in no-flow. |
| Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. |
| Same-flow transfer | Use context from another episode with the same hidden flow | Better than wrong-flow context transfer. |
| No-flow context norm | Measure `||c_t||` on no-flow data | Smaller than flow context norm. |
| Context PCA | Plot `c_t` by flow family / flow id | Flow-related organization. |
FlowMo latent probes:
| Probe Target | Feature Sets | Purpose |
|---|---|---|
| Object momentum `(vx, vy, omega)` | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether short-history state contains object motion. |
| Local flow vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether state plus context exposes local ambient drift. |
| Episode drift vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether long context contains environment-level drift. |
## B. Traditional Non-WM Control Comparison
Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control.
Learned WM planners:
```text
flowmo
leworldmodel
planet
tdmpc2
```
Traditional non-WM controllers:
| Method | Comparison Role | What It Tests |
|---|---|---|
| `pid_los_controller` | Simple classical controller | Baseline waypoint tracking without learned dynamics. |
| `no_flow_los_controller` | No-flow LOS controller | Effect of ignoring hidden current in a geometric controller. |
| `current_estimator_los_controller` | Current-estimator LOS controller | Strength of a hand-designed drift estimator in a geometric controller. |
| `oracle_flow_los_controller` | Oracle-flow LOS controller | Effect of true local flow feed-forward in a geometric controller. |
Planning tasks:
```text
reach_target
station_keeping
waypoint_square
waypoint_zigzag
```
Boats:
```text
twin
triangle
```
Flow families:
```text
noflow
uniform
vortex_center
double_gyre
source_sink
source_sink_pair
gradient
shear
turbulent_patch
random_fourier
```
Planning metrics:
```text
success rate
final distance
trajectory length over successful episodes
control effort (`sum_t ||a_t||_2^2`) over successful episodes
time to goal over successful episodes
```
Formal commands:
```bash
python -m experiments.run_paper_image_pipeline --stages train
python -m experiments.run_paper_image_pipeline --stages prediction
python -m experiments.run_paper_image_pipeline --stages probe
python -m experiments.run_paper_image_pipeline --stages planning
python -m experiments.run_paper_image_pipeline --stages report
```