File size: 5,352 Bytes
604e535
 
 
 
 
 
 
 
 
 
 
 
 
 
ccf9f1b
 
604e535
 
 
 
 
ccf9f1b
 
 
604e535
 
 
 
 
 
 
 
 
 
 
 
ccf9f1b
 
604e535
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ccf9f1b
604e535
 
ccf9f1b
604e535
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc396fd
 
 
604e535
 
 
 
ccf9f1b
604e535
 
 
 
 
 
 
 
 
 
 
 
ccf9f1b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
604e535
 
 
 
 
 
8e384df
604e535
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# FlowMo Paper Experiment Matrix

This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths.

## Shared Data And Observation Protocol

All learned world models use the same simulator data and clean-image observation pipeline.

```text
Image input: clean top-down RGB boat image
Image size: 160 x 160
Visual scale: 2.5
Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker
Train split: data/paper/train.npz
Test split: data/paper/test.npz
Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier
Config: experiments/shared/config/paper_image.json
Checkpoint: paper.pt
Intermediate checkpoints: paper_step_XXXXXX.pt
```

All flow fields are static. Localized flow structures are sampled near the
route corridors used by the training controllers and final planning tasks.

Formal training budget:

```text
train_episodes: 2400
test_episodes: 480
train_windows: 393216
test_windows: 24576
batch_size: 256
steps: 20000
checkpoint_interval: 2000
num_workers: 4
render_mode: device
training_parallel_jobs: 2
planning_parallel_jobs: 3
```

Precision policy:

```text
training: bf16 model autocast, fp32 losses and metrics
prediction_eval: bf16 model autocast, fp32 metrics
planning_eval: fp32
```

## A. Learned World-Model Comparison

Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum.

| Method | Comparison Role | What It Tests |
|---|---|---|
| `flowmo` | Proposed WM | Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition. |
| `leworldmodel` | JEPA-style WM baseline | Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow. |
| `planet` | RSSM WM baseline | Whether generic recurrent latent memory can represent momentum and drift without a separate context factor. |
| `tdmpc2` | Compact latent-dynamics WM baseline | Whether a compact action-conditioned latent transition matches FlowMo under equal supervision. |

Prediction dataset:

```text
test
```

Prediction metrics:

```text
pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
heading@20, heading@60
zero-action drift error
no-flow momentum decay error
same-action different-flow error
```

FlowMo context diagnostics:

| Diagnostic | Operation | Evidence Sought |
|---|---|---|
| Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. |
| Zero context | Set `c_t=0` | Degraded flow prediction, smaller change in no-flow. |
| Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. |
| Same-flow transfer | Use context from another episode with the same hidden flow | Better than wrong-flow context transfer. |
| No-flow context norm | Measure `||c_t||` on no-flow data | Smaller than flow context norm. |
| Context PCA | Plot `c_t` by flow family / flow id | Flow-related organization. |

FlowMo latent probes:

| Probe Target | Feature Sets | Purpose |
|---|---|---|
| Object momentum `(vx, vy, omega)` | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether short-history state contains object motion. |
| Local flow vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether state plus context exposes local ambient drift. |
| Episode drift vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether long context contains environment-level drift. |

## B. Traditional Non-WM Control Comparison

Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control.

Learned WM planners:

```text
flowmo
leworldmodel
planet
tdmpc2
```

Traditional non-WM controllers:

| Method | Comparison Role | What It Tests |
|---|---|---|
| `pid_los_controller` | Simple classical controller | Baseline waypoint tracking without learned dynamics. |
| `no_flow_los_controller` | No-flow LOS controller | Effect of ignoring hidden current in a geometric controller. |
| `current_estimator_los_controller` | Current-estimator LOS controller | Strength of a hand-designed drift estimator in a geometric controller. |
| `oracle_flow_los_controller` | Oracle-flow LOS controller | Effect of true local flow feed-forward in a geometric controller. |

Planning tasks:

```text
reach_target
station_keeping
waypoint_square
waypoint_zigzag
```

Boats:

```text
twin
triangle
```

Flow families:

```text
noflow
uniform
vortex_center
double_gyre
source_sink
source_sink_pair
gradient
shear
turbulent_patch
random_fourier
```

Planning metrics:

```text
success rate
final distance
trajectory length over successful episodes
control effort (`sum_t ||a_t||_2^2`) over successful episodes
time to goal over successful episodes
```

Formal commands:

```bash
python -m experiments.run_paper_image_pipeline --stages train
python -m experiments.run_paper_image_pipeline --stages prediction
python -m experiments.run_paper_image_pipeline --stages probe
python -m experiments.run_paper_image_pipeline --stages planning
python -m experiments.run_paper_image_pipeline --stages report
```