File size: 5,012 Bytes
604e535
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# FlowMo Paper Experiment Matrix

This document defines the paper-facing experiments. File and run names avoid version suffixes; regenerated artifacts replace the same public paths.

## Shared Data And Observation Protocol

All learned world models use the same simulator data and clean-image observation pipeline.

```text
Image input: clean top-down RGB boat image
Image size: 160 x 160
Visual scale: 2.5
Forbidden image cues: flow arrows, velocity vectors, trajectory overlays, goal marker
Train split: data/paper/train.npz
Primary unseen-flow split: data/paper/test_unseen_flow.npz
Primary unseen-boat-dynamics split: data/paper/test_unseen_boat_params.npz
Diagnostic seen-flow-family split: data/paper/diagnostic_seen_flow.npz
Config: experiments/shared/config/paper_image.json
Checkpoint: paper.pt
Intermediate checkpoints: paper_step_XXXXXX.pt
```

Formal training budget:

```text
train_episodes: 2400
test_episodes: 480
train_windows: 393216
test_windows: 24576
batch_size: 256
steps: 20000
checkpoint_interval: 2000
num_workers: 4
render_mode: device
```

Precision policy:

```text
training: bf16 model autocast, fp32 losses and metrics
prediction_eval: bf16 model autocast, fp32 metrics
planning_eval: fp32
```

## A. Learned World-Model Comparison

Purpose: measure world-model quality directly. The key question is whether FlowMo's short object-motion state plus long ambient-drift context improves rollout prediction under hidden currents and momentum.

| Method | Comparison Role | What It Tests |
|---|---|---|
| `flowmo` | Proposed WM | Explicit flow-momentum factorization: short state/momentum latent, long drift context, zero-context residual transition. |
| `leworldmodel` | JEPA-style WM baseline | Whether simple image-latent prediction without explicit history/context can handle boat momentum and flow. |
| `planet` | RSSM WM baseline | Whether generic recurrent latent memory can represent momentum and drift without a separate context factor. |
| `tdmpc2` | Compact latent-dynamics WM baseline | Whether a compact action-conditioned latent transition matches FlowMo under equal supervision. |

Prediction datasets:

```text
test_unseen_flow
test_unseen_boat_params
diagnostic_seen_flow
```

Prediction metrics:

```text
pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
heading@20, heading@60
zero-action drift error
no-flow momentum decay error
same-action different-flow error
```

FlowMo context diagnostics:

| Diagnostic | Operation | Evidence Sought |
|---|---|---|
| Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. |
| Zero context | Set `c_t=0` | Degraded flow prediction, smaller change in no-flow. |
| Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. |
| Same-flow transfer | Use context from another episode with the same hidden flow | Better than wrong-flow context transfer. |
| No-flow context norm | Measure `||c_t||` on no-flow data | Smaller than flow context norm. |
| Context PCA | Plot `c_t` by flow family / flow id | Flow-related organization. |

FlowMo latent probes:

| Probe Target | Feature Sets | Purpose |
|---|---|---|
| Object momentum `(vx, vy, omega)` | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether short-history state contains object motion. |
| Local flow vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether state plus context exposes local ambient drift. |
| Episode drift vector | `z_t`, `c_t`, `[z_t,c_t]` | Tests whether long context contains environment-level drift. |

## B. Traditional Non-WM Control Comparison

Purpose: provide downstream control references and report practical task behavior. The central WM claim still comes from A; B shows whether prediction differences matter for planning and control.

Learned WM planners:

```text
flowmo
leworldmodel
planet
tdmpc2
```

Traditional non-WM controllers:

| Method | Comparison Role | What It Tests |
|---|---|---|
| `pid_los_controller` | Simple classical controller | Baseline waypoint tracking without learned dynamics. |
| `physics_mpc_no_flow` | Nominal physics MPC | Effect of ignoring hidden current. |
| `current_estimator_mpc` | Current-compensated classical MPC | Strength of a hand-designed drift estimator. |
| `oracle_flow_mpc` | Oracle reference | Reference performance when true local flow is available. |

Planning tasks:

```text
reach_uniform
counterflow
station_keeping
passive_to_active
waypoint_square
waypoint_zigzag
```

Boats:

```text
twin
triangle
```

Planning metrics:

```text
success rate
final distance
trajectory length over successful episodes
energy / thrust work over successful episodes
time to goal over successful episodes
```

Formal commands:

```bash
python -m experiments.run_paper_image_pipeline --stages train
python -m experiments.run_paper_image_pipeline --stages prediction
python -m experiments.run_paper_image_pipeline --stages probe
python -m experiments.run_paper_image_pipeline --stages planning
python -m experiments.run_paper_image_pipeline --stages report
```