File size: 5,352 Bytes
604e535 ccf9f1b 604e535 ccf9f1b 604e535 ccf9f1b 604e535 ccf9f1b 604e535 ccf9f1b 604e535 cc396fd 604e535 ccf9f1b 604e535 ccf9f1b 604e535 8e384df 604e535 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | # FlowMo Paper Experiment Matrix
This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths.
## Shared Data And Observation Protocol
All learned world models use the same simulator data and clean-image observation pipeline.
```text
Image input: clean top-down RGB boat image
Image size: 160 x 160
Visual scale: 2.5
Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker
Train split: data/paper/train.npz
Test split: data/paper/test.npz
Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier
Config: experiments/shared/config/paper_image.json
Checkpoint: paper.pt
Intermediate checkpoints: paper_step_XXXXXX.pt
```
All flow fields are static. Localized flow structures are sampled near the
route corridors used by the training controllers and final planning tasks.
Formal training budget:
```text
train_episodes: 2400
test_episodes: 480
train_windows: 393216
test_windows: 24576
batch_size: 256
steps: 20000
checkpoint_interval: 2000
num_workers: 4
render_mode: device
training_parallel_jobs: 2
planning_parallel_jobs: 3
```
Precision policy:
```text
training: bf16 model autocast, fp32 losses and metrics
prediction_eval: bf16 model autocast, fp32 metrics
planning_eval: fp32
```
## A. Learned World-Model Comparison
Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum.
| Method | Comparison Role | What It Tests |
|---|---|---|
| `flowmo` | Proposed WM | Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition. |
| `leworldmodel` | JEPA-style WM baseline | Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow. |
| `planet` | RSSM WM baseline | Whether generic recurrent latent memory can represent momentum and drift without a separate context factor. |
| `tdmpc2` | Compact latent-dynamics WM baseline | Whether a compact action-conditioned latent transition matches FlowMo under equal supervision. |
Prediction dataset:
```text
test
```
Prediction metrics:
```text
pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
heading@20, heading@60
zero-action drift error
no-flow momentum decay error
same-action different-flow error
```
FlowMo context diagnostics:
| Diagnostic | Operation | Evidence Sought |
|---|---|---|
| Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. |
| Zero context | Set `c_t=0` | Degraded flow prediction, smaller change in no-flow. |
| Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. |
| Same-flow transfer | Use context from another episode with the same hidden flow | Better than wrong-flow context transfer. |
| No-flow context norm | Measure `||c_t||` on no-flow data | Smaller than flow context norm. |
| Context PCA | Plot `c_t` by flow family / flow id | Flow-related organization. |
FlowMo latent probes:
| Probe Target | Feature Sets | Purpose |
|---|---|---|
| Object momentum `(vx, vy, omega)` | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether short-history state contains object motion. |
| Local flow vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether state plus context exposes local ambient drift. |
| Episode drift vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether long context contains environment-level drift. |
## B. Traditional Non-WM Control Comparison
Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control.
Learned WM planners:
```text
flowmo
leworldmodel
planet
tdmpc2
```
Traditional non-WM controllers:
| Method | Comparison Role | What It Tests |
|---|---|---|
| `pid_los_controller` | Simple classical controller | Baseline waypoint tracking without learned dynamics. |
| `no_flow_los_controller` | No-flow LOS controller | Effect of ignoring hidden current in a geometric controller. |
| `current_estimator_los_controller` | Current-estimator LOS controller | Strength of a hand-designed drift estimator in a geometric controller. |
| `oracle_flow_los_controller` | Oracle-flow LOS controller | Effect of true local flow feed-forward in a geometric controller. |
Planning tasks:
```text
reach_target
station_keeping
waypoint_square
waypoint_zigzag
```
Boats:
```text
twin
triangle
```
Flow families:
```text
noflow
uniform
vortex_center
double_gyre
source_sink
source_sink_pair
gradient
shear
turbulent_patch
random_fourier
```
Planning metrics:
```text
success rate
final distance
trajectory length over successful episodes
control effort (`sum_t ||a_t||_2^2`) over successful episodes
time to goal over successful episodes
```
Formal commands:
```bash
python -m experiments.run_paper_image_pipeline --stages train
python -m experiments.run_paper_image_pipeline --stages prediction
python -m experiments.run_paper_image_pipeline --stages probe
python -m experiments.run_paper_image_pipeline --stages planning
python -m experiments.run_paper_image_pipeline --stages report
```
|