# Experiments This directory contains the paper-facing experiment code, checkpoints, results, figures, GIFs, tables, and reports. This directory contains two formal experiment categories: - **A. Learned world models**: trainable image-input WMs evaluated on rollout prediction and WM-based planning. - **B. Traditional non-WM controllers**: hand-designed control baselines evaluated on the same downstream tasks. Main method: - **FlowMo**: Flow-Momentum World Model, the proposed drift-aware world model for surface vehicles. Category A learned WM comparisons: - **LeWorldModel**: JEPA-style latent predictor under the shared clean-image protocol. - **PlaNet RSSM**: recurrent state-space world-model baseline under the shared clean-image protocol. - **TD-MPC2 Dynamics**: task-oriented latent dynamics baseline under the shared clean-image protocol. Purpose of Category A: compare world-model architectures under identical image data, optimizer budget, parameter budget, rollout target, and evaluation protocol. Category B traditional controllers: - **PID/LOS controller** - **Physics MPC No-Flow** - **Current-Estimator MPC** - **Oracle-Flow MPC** Purpose of Category B: compare downstream task behavior against non-neural controllers that do not train a world model. Baseline details are documented in `BASELINES.md`; the full experiment protocol is documented in `docs/EXPERIMENT_PROTOCOL.md`. Design principles: - Shared simulator, datasets, planning utilities, metrics, and visualization live in `shared/`. - Each method has its own directory with `src/`, `checkpoint/`, and `result/`. - Paper artifacts are collected in top-level `figures/`, `gifs/`, `tables/`, and `reports/`. - Method names should be explicit and readable. Avoid cryptic suffixes in paper-facing file names. Standard method interface: ```text src/model.py # build_model(), load_model() src/train.py # train(config) src/predict.py # rollout(model, batch) src/config.py # default_config() ``` Closed-loop planning for learned world models is implemented once in `evaluate_image_planning.py` so every learned method is evaluated through the same CEM interface. Traditional controllers use: ```text src/controller.py or src/mpc.py src/evaluate.py src/config.py ``` Formal clean-image configuration: ```text image_size=160 visual_scale=2.5 train=data/paper/train.npz test=data/paper/test_unseen_flow.npz and data/paper/test_unseen_boat_params.npz ``` Full paper-facing image pipeline: ```bash python -m experiments.run_paper_image_pipeline ``` The default command runs the paper configuration end to end: train all learned world models, evaluate long rollout prediction, run FlowMo latent probes, evaluate closed-loop planning against traditional controllers, generate GIFs, and write the final report. Images are rendered online from simulator states, so no separate image-cache preparation step is required. Manual image training: ```bash python -m experiments.train_image_world_models python -m experiments.evaluate_image_world_models python -m experiments.evaluate_flowmo_latent_probes python -m experiments.evaluate_image_planning --task reach_uniform --boat twin python -m experiments.summarize_paper_image_results ```