# FlowMo Experiment Protocol This document is the single paper-facing record for the public FlowMo experiments. Regenerated artifacts should replace the same paths instead of introducing version suffixes. ## Scope The project has two formal comparison groups. ### A. Learned World Models Purpose: compare image-input world-model architectures under the same simulator data, optimizer budget, rollout target, parameter scale, and planning interface. | Directory | Report Name | Architecture | Comparison Purpose | |---|---|---|---| | `flowmo` | FlowMo | Shared image encoder; short object-motion state encoder; long strided ambient-drift context encoder; base transition plus zero-context residual | Proposed flow-momentum factorization. Tests whether separating endogenous motion from exogenous drift improves prediction and planning. | | `leworldmodel` | LeWorldModel | JEPA-style image-latent predictor with action-conditioned residual transition | Tests whether simple current-image latent prediction is sufficient. | | `planet` | RSSM | Recurrent state-space latent model with deterministic memory and stochastic latent state | Tests whether generic recurrent memory can absorb momentum and drift without a separate context factor. | | `tdmpc2` | TD-MPC2 Dynamics | Compact action-conditioned latent dynamics with shared image encoder and rollout heads | Tests task-oriented latent dynamics under equal supervision. | All learned methods receive clean top-down RGB boat images and action history. They do not receive flow labels, flow arrows, velocity vectors, trajectory overlays, or goal markers in the image. ### B. Traditional Non-WM Controllers Purpose: compare downstream behavior against non-neural controllers that do not train a world model. | Directory | Report Name | Input | Comparison Purpose | |---|---|---|---| | `pid_los_controller` | PID/LOS | Clean image pose estimate | Simple hand-designed waypoint tracking baseline. | | `no_flow_los_controller` | No-Flow LOS Controller | Clean image pose estimate | Measures the cost of ignoring ambient current. | | `current_estimator_los_controller` | Current-Estimator LOS Controller | Clean image pose estimate and recent drift | Strong classical current-compensation baseline. | | `oracle_flow_los_controller` | Oracle-Flow LOS Controller | Clean image pose estimate and simulator local flow | True-local-flow feed-forward reference for a simple geometric controller, not a full dynamics-MPC upper bound. | ## Data All methods use the same splits: ```text train: data/paper/train.npz test: data/paper/test.npz dataset_card: data/paper/dataset_card.md generation_config: data/paper/generation_config.json ``` The train split, test split, and final planning evaluation use the same paper flow-family set: `noflow`, `uniform`, `vortex_center`, `double_gyre`, `source_sink`, `source_sink_pair`, `gradient`, `shear`, `turbulent_patch`, and `random_fourier`. All paper flow fields are static. Localized structures are sampled near common task routes and waypoint corridors so that non-uniform flow is encountered by the boat during both training trajectories and final planning tasks. Observation protocol: ```text image_size: 160 x 160 visual_scale: 2.5 rendering: online clean top-down RGB images forbidden cues: flow arrows, velocity vectors, trajectory overlays, goal markers ``` Training budget: ```text train_episodes: 2400 test_episodes: 480 train_windows: 393216 test_windows: 24576 batch_size: 256 steps: 20000 checkpoint_interval: 2000 num_workers: 4 render_mode: device training_parallel_jobs: 2 planning_parallel_jobs: 3 ``` Precision policy: ```text training: bf16 model autocast, fp32 losses and metrics prediction_eval: bf16 model autocast, fp32 metrics planning_eval: fp32 ``` The precision split is intentional: BF16 speeds up image encoding and latent rollout on the RTX 5090 without measurable short-run loss drift, while CEM planning is dominated by small control tensors and did not improve under BF16. ## Prediction Evaluation Dataset: ```text test ``` Metrics: ```text pos@1, pos@5, pos@10, pos@20, pos@40, pos@60 heading@20, heading@60 zero-action drift prediction error no-flow momentum decay prediction error same-action different-flow prediction error ``` FlowMo-only context diagnostics: | Diagnostic | Operation | Evidence Sought | |---|---|---| | Inferred context | Normal rollout with inferred `c_t` | Best prediction under flow. | | Zero context | Set `c_t=0` | Degraded flow prediction and limited change in no-flow. | | Shuffled context | Use context from another episode | Worse rollout when hidden flow differs. | | Same-flow transfer | Use context from another episode with the same hidden flow | Better transfer than wrong-flow context. | | Context norm | Compare no-flow and flow `||c_t||` | Flow context should be larger than no-flow context. | FlowMo latent probes: ```text Train frozen linear probes from z_t, c_t, and [z_t,c_t]. Targets: object momentum (vx, vy, omega), local flow vector, episode drift vector. Purpose: verify which latent carries object-motion information and which latent carries ambient-drift information. ``` ## Planning Evaluation Learned WM planners: ```text flowmo leworldmodel planet tdmpc2 ``` All learned world models use the same route-aware CEM planner over their latent rollouts. Traditional non-WM controllers: ```text pid_los_controller no_flow_los_controller current_estimator_los_controller oracle_flow_los_controller ``` Tasks: ```text reach_target station_keeping waypoint_square waypoint_zigzag ``` Boats: ```text twin triangle ``` Flow families: ```text noflow uniform vortex_center double_gyre source_sink source_sink_pair gradient shear turbulent_patch random_fourier ``` Metrics: ```text success rate final distance trajectory length over successful episodes control effort (`sum_t ||a_t||_2^2`) over successful episodes time to goal over successful episodes ``` ## Required Outputs Training outputs: ```text experiments//checkpoint/paper.pt experiments//checkpoint/paper_step_*.pt experiments//result/parameter_count.json experiments//result/paper_training.json experiments//result/paper_training_trace.jsonl ``` Evaluation outputs: ```text experiments/reports/paper_prediction.json experiments/reports/paper_flowmo_latent_probes.json experiments/reports/paper_planning/*.json experiments/reports/paper_planning/gifs/*.gif experiments/reports/paper_report.md ``` ## Commands Run the complete paper pipeline: ```bash python -m experiments.run_paper_image_pipeline ``` Run stages separately: ```bash python -m experiments.run_paper_image_pipeline --stages train python -m experiments.run_paper_image_pipeline --stages prediction python -m experiments.run_paper_image_pipeline --stages probe python -m experiments.run_paper_image_pipeline --stages planning python -m experiments.run_paper_image_pipeline --stages report ``` Run tests: ```bash python -m pytest -q ```