# Paper Task Plan This is the execution plan for the public FlowMo experiments. The plan has two parts: A evaluates world models directly, and B evaluates downstream control behavior with traditional non-WM references. ## A. Learned World Models Purpose: test whether the FlowMo world-model architecture improves image-based prediction under hidden flow, boat momentum, actuator delay, and drag. Shared setup: ```text Input: clean top-down boat images plus action history No image cues: no flow arrows, no velocity vector, no goal marker Training data: data/paper/train.npz Evaluation data: data/paper/test.npz Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier All flow fields are static. Localized flow structures are sampled near common task routes so the boat encounters non-uniform current during rollout. Training budget: shared optimizer, batch size, rollout horizon, step count, and checkpoint schedule Training precision: BF16 model autocast, FP32 losses and metrics Prediction precision: BF16 model autocast, FP32 metrics ``` Compared methods: | Method | Purpose | |---|---| | `flowmo` | Proposed flow-momentum WM. Tests explicit separation of short object-motion state and long ambient-drift context. | | `leworldmodel` | JEPA-style latent predictor. Tests whether a simple current-image latent transition is sufficient. | | `planet` | RSSM recurrent state-space WM. Tests whether generic recurrent latent memory can absorb momentum and flow effects without FlowMo's explicit context. | | `tdmpc2` | Compact latent-dynamics WM. Tests whether a task-oriented latent transition architecture matches FlowMo under the same rollout supervision. | Primary A metrics: ```text pos@1, pos@5, pos@10, pos@20, pos@40, pos@60 heading@20, heading@60 zero-action drift prediction error no-flow momentum decay prediction error same-action different-flow prediction error FlowMo inferred-context vs c=0 vs shuffled-context error ``` Required A outputs: ```text experiments//checkpoint/paper.pt experiments//checkpoint/paper_step_*.pt experiments//result/parameter_count.json experiments//result/paper_training.json experiments/reports/paper_prediction.json experiments/reports/paper_flowmo_latent_probes.json ``` Core A conclusions: ```text 1. Whether FlowMo has lower long-horizon rollout error. 2. Whether the gain holds across the full paper flow-family set. 3. Whether explicit drift context helps beyond ordinary recurrent history. 4. Whether the same architecture works for both twin and triangle boats. 5. Whether frozen linear probes recover object momentum from `z_t` and ambient drift from `c_t`. ``` ## B. Traditional Non-WM Controllers Purpose: evaluate downstream control behavior and provide non-neural-control reference points. These methods do not train a world model. Shared setup: ```text Input: clean top-down images converted to pose for classical control Tasks: same simulator, same boats, same goals, same flow settings Metrics: success, final distance, successful-episode trajectory length, successful-episode thrust energy, successful-episode time-to-goal Planning precision: FP32 ``` Compared methods: | Method | Purpose | |---|---| | `pid_los_controller` | Classical line-of-sight waypoint tracking. Tests a simple hand-designed controller. | | `no_flow_los_controller` | No-flow LOS controller without external-current compensation. Tests how much hidden flow hurts a nominal dynamics controller. | | `current_estimator_los_controller` | LOS controller with recent-drift current estimation. Tests a strong classical current-compensation baseline. | | `oracle_flow_los_controller` | LOS controller with simulator true local flow feed-forward. Tests whether local flow feed-forward helps a simple geometric controller; it is not a full dynamics-MPC upper bound. | Planning tasks: ```text reach_target station_keeping waypoint_square waypoint_zigzag ``` Boats: ```text twin triangle ``` Flow families: ```text noflow uniform vortex_center double_gyre source_sink source_sink_pair gradient shear turbulent_patch random_fourier ``` Required B outputs: ```text experiments/reports/paper_planning/*.json experiments/reports/paper_planning/gifs/*.gif ``` Core B conclusions: ```text 1. Whether WM-based planning is competitive with classical non-WM control. 2. Whether FlowMo improves success, final distance, energy, and path length versus other learned WMs. 3. How far FlowMo remains from the oracle-flow classical reference. ```