File size: 6,570 Bytes
604e535 ee93556 604e535 ee93556 604e535 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | # FlowMo Experiment Protocol
This document is the single paper-facing record for the public FlowMo experiments. Regenerated artifacts should replace the same paths instead of introducing version suffixes.
## Scope
The project has two formal comparison groups.
### A. Learned World Models
Purpose: compare image-input world-model architectures under the same simulator data, optimizer budget, rollout target, parameter scale, and planning interface.
| Directory | Report Name | Architecture | Comparison Purpose |
|---|---|---|---|
| `flowmo` | FlowMo | Shared image encoder; short object-motion state encoder; long strided ambient-drift context encoder; base transition plus zero-context residual | Proposed flow-momentum factorization. Tests whether separating endogenous motion from exogenous drift improves prediction and planning. |
| `leworldmodel` | LeWorldModel | JEPA-style image-latent predictor with action-conditioned residual transition | Tests whether simple current-image latent prediction is sufficient. |
| `planet` | RSSM | Recurrent state-space latent model with deterministic memory and stochastic latent state | Tests whether generic recurrent memory can absorb momentum and drift without a separate context factor. |
| `tdmpc2` | TD-MPC2 Dynamics | Compact action-conditioned latent dynamics with shared image encoder and rollout heads | Tests task-oriented latent dynamics under equal supervision. |
All learned methods receive clean top-down RGB boat images and action history. They do not receive flow labels, flow arrows, velocity vectors, trajectory overlays, or goal markers in the image.
### B. Traditional Non-WM Controllers
Purpose: compare downstream behavior against non-neural controllers that do not train a world model.
| Directory | Report Name | Input | Comparison Purpose |
|---|---|---|---|
| `pid_los_controller` | PID/LOS | Clean image pose estimate | Simple hand-designed waypoint tracking baseline. |
| `physics_mpc_no_flow` | Physics MPC No-Flow | Clean image pose estimate | Measures the cost of ignoring ambient current. |
| `current_estimator_mpc` | Current-Estimator MPC | Clean image pose estimate and recent drift | Strong classical current-compensation baseline. |
| `oracle_flow_mpc` | Oracle-Flow MPC | Clean image pose estimate and simulator local flow | Reference bound for control when true local flow is available. |
## Data
All methods use the same splits:
```text
train: data/paper/train.npz
unseen_flow_test: data/paper/test_unseen_flow.npz
unseen_boat_dynamics_test: data/paper/test_unseen_boat_params.npz
seen_flow_diagnostic: data/paper/diagnostic_seen_flow.npz
dataset_card: data/paper/dataset_card.md
generation_config: data/paper/generation_config.json
```
Observation protocol:
```text
image_size: 160 x 160
visual_scale: 2.5
rendering: online clean top-down RGB images
forbidden cues: flow arrows, velocity vectors, trajectory overlays, goal markers
```
Training budget:
```text
train_episodes: 2400
test_episodes: 480
train_windows: 393216
test_windows: 24576
batch_size: 256
steps: 20000
checkpoint_interval: 2000
num_workers: 4
render_mode: device
```
Precision policy:
```text
training: bf16 model autocast, fp32 losses and metrics
prediction_eval: bf16 model autocast, fp32 metrics
planning_eval: fp32
```
The precision split is intentional: BF16 speeds up image encoding and latent rollout on the RTX 5090 without measurable short-run loss drift, while CEM planning is dominated by small control tensors and did not improve under BF16.
## Prediction Evaluation
Datasets:
```text
test_unseen_flow
test_unseen_boat_params
diagnostic_seen_flow
```
Metrics:
```text
pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
heading@20, heading@60
zero-action drift prediction error
no-flow momentum decay prediction error
same-action different-flow prediction error
```
FlowMo-only context diagnostics:
| Diagnostic | Operation | Evidence Sought |
|---|---|---|
| Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. |
| Zero context | Set `c_t=0` | Degraded flow prediction and limited change in no-flow. |
| Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. |
| Same-flow transfer | Use context from another episode with the same hidden flow | Better transfer than wrong-flow context. |
| Context norm | Compare no-flow and flow `||c_t||` | Flow context should be larger than no-flow context. |
FlowMo latent probes:
```text
Train frozen linear probes from z_t, c_t, and [z_t,c_t].
Targets: object momentum (vx, vy, omega), local flow vector, episode drift vector.
Purpose: verify which latent carries object-motion information and which latent carries ambient-drift information.
```
## Planning Evaluation
Learned WM planners:
```text
flowmo
leworldmodel
planet
tdmpc2
```
All learned world models use the same route-aware CEM planner over their latent rollouts.
Traditional non-WM controllers:
```text
pid_los_controller
physics_mpc_no_flow
current_estimator_mpc
oracle_flow_mpc
```
Tasks:
```text
reach_uniform
counterflow
station_keeping
passive_to_active
waypoint_square
waypoint_zigzag
```
Boats:
```text
twin
triangle
```
Metrics:
```text
success rate
final distance
trajectory length over successful episodes
energy / thrust work over successful episodes
time to goal over successful episodes
```
## Required Outputs
Training outputs:
```text
experiments/<method>/checkpoint/paper.pt
experiments/<method>/checkpoint/paper_step_*.pt
experiments/<method>/result/parameter_count.json
experiments/<method>/result/paper_training.json
experiments/<method>/result/paper_training_trace.jsonl
```
Evaluation outputs:
```text
experiments/reports/paper_prediction_unseen_flow.json
experiments/reports/paper_prediction_unseen_boat_params.json
experiments/reports/paper_prediction_seen_flow_diagnostic.json
experiments/reports/paper_flowmo_latent_probes.json
experiments/reports/paper_planning/*.json
experiments/reports/paper_planning/gifs/*.gif
experiments/reports/paper_report.md
```
## Commands
Run the complete paper pipeline:
```bash
python -m experiments.run_paper_image_pipeline
```
Run stages separately:
```bash
python -m experiments.run_paper_image_pipeline --stages train
python -m experiments.run_paper_image_pipeline --stages prediction
python -m experiments.run_paper_image_pipeline --stages probe
python -m experiments.run_paper_image_pipeline --stages planning
python -m experiments.run_paper_image_pipeline --stages report
```
Run tests:
```bash
python -m pytest -q
```
|