| # Paper Task Plan |
|
|
| This is the execution plan for the public FlowMo experiments. The plan has two parts: A evaluates world models directly, and B evaluates downstream control behavior with traditional non-WM references. |
|
|
| ## A. Learned World Models |
|
|
| Purpose: test whether the FlowMo world-model architecture improves image-based prediction under hidden flow, boat momentum, actuator delay, and drag. |
|
|
| Shared setup: |
|
|
| ```text |
| Input: clean top-down boat images plus action history |
| No image cues: no flow arrows, no velocity vector, no goal marker |
| Training data: data/paper/train.npz |
| Evaluation data: data/paper/test.npz |
| Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier |
| All flow fields are static. Localized flow structures are sampled near common |
| task routes so the boat encounters non-uniform current during rollout. |
| Training budget: shared optimizer, batch size, rollout horizon, step count, and checkpoint schedule |
| Training precision: BF16 model autocast, FP32 losses and metrics |
| Prediction precision: BF16 model autocast, FP32 metrics |
| ``` |
|
|
| Compared methods: |
|
|
| | Method | Purpose | |
| |---|---| |
| | `flowmo` | Proposed flow-momentum WM. Tests explicit separation of short object-motion state and long ambient-drift context. | |
| | `leworldmodel` | JEPA-style latent predictor. Tests whether a simple current-image latent transition is sufficient. | |
| | `planet` | RSSM recurrent state-space WM. Tests whether generic recurrent latent memory can absorb momentum and flow effects without FlowMo's explicit context. | |
| | `tdmpc2` | Compact latent-dynamics WM. Tests whether a task-oriented latent transition architecture matches FlowMo under the same rollout supervision. | |
|
|
| Primary A metrics: |
|
|
| ```text |
| pos@1, pos@5, pos@10, pos@20, pos@40, pos@60 |
| heading@20, heading@60 |
| zero-action drift prediction error |
| no-flow momentum decay prediction error |
| same-action different-flow prediction error |
| FlowMo inferred-context vs c=0 vs shuffled-context error |
| ``` |
|
|
| Required A outputs: |
|
|
| ```text |
| experiments/<method>/checkpoint/paper.pt |
| experiments/<method>/checkpoint/paper_step_*.pt |
| experiments/<method>/result/parameter_count.json |
| experiments/<method>/result/paper_training.json |
| experiments/reports/paper_prediction.json |
| experiments/reports/paper_flowmo_latent_probes.json |
| ``` |
|
|
| Core A conclusions: |
|
|
| ```text |
| 1. Whether FlowMo has lower long-horizon rollout error. |
| 2. Whether the gain holds across the full paper flow-family set. |
| 3. Whether explicit drift context helps beyond ordinary recurrent history. |
| 4. Whether the same architecture works for both twin and triangle boats. |
| 5. Whether frozen linear probes recover object momentum from `z_t` and ambient drift from `c_t`. |
| ``` |
|
|
| ## B. Traditional Non-WM Controllers |
|
|
| Purpose: evaluate downstream control behavior and provide non-neural-control reference points. These methods do not train a world model. |
|
|
| Shared setup: |
|
|
| ```text |
| Input: clean top-down images converted to pose for classical control |
| Tasks: same simulator, same boats, same goals, same flow settings |
| Metrics: success, final distance, successful-episode trajectory length, successful-episode thrust energy, successful-episode time-to-goal |
| Planning precision: FP32 |
| ``` |
|
|
| Compared methods: |
|
|
| | Method | Purpose | |
| |---|---| |
| | `pid_los_controller` | Classical line-of-sight waypoint tracking. Tests a simple hand-designed controller. | |
| | `no_flow_los_controller` | No-flow LOS controller without external-current compensation. Tests how much hidden flow hurts a nominal dynamics controller. | |
| | `current_estimator_los_controller` | LOS controller with recent-drift current estimation. Tests a strong classical current-compensation baseline. | |
| | `oracle_flow_los_controller` | LOS controller with simulator true local flow feed-forward. Tests whether local flow feed-forward helps a simple geometric controller; it is not a full dynamics-MPC upper bound. | |
|
|
| Planning tasks: |
|
|
| ```text |
| reach_target |
| station_keeping |
| waypoint_square |
| waypoint_zigzag |
| ``` |
|
|
| Boats: |
|
|
| ```text |
| twin |
| triangle |
| ``` |
|
|
| Flow families: |
|
|
| ```text |
| noflow |
| uniform |
| vortex_center |
| double_gyre |
| source_sink |
| source_sink_pair |
| gradient |
| shear |
| turbulent_patch |
| random_fourier |
| ``` |
|
|
| Required B outputs: |
|
|
| ```text |
| experiments/reports/paper_planning/*.json |
| experiments/reports/paper_planning/gifs/*.gif |
| ``` |
|
|
| Core B conclusions: |
|
|
| ```text |
| 1. Whether WM-based planning is competitive with classical non-WM control. |
| 2. Whether FlowMo improves success, final distance, energy, and path length versus other learned WMs. |
| 3. How far FlowMo remains from the oracle-flow classical reference. |
| ``` |
|
|