| # Experiments |
|
|
| This directory contains the paper-facing experiment code, checkpoints, results, GIFs, tables, and reports. |
|
|
| This directory contains two formal experiment categories: |
|
|
| - **A. Learned world models**: trainable image-input WMs evaluated on rollout prediction and WM-based planning. |
| - **B. Traditional non-WM controllers**: hand-designed control baselines evaluated on the same downstream tasks. |
|
|
| Main method: |
|
|
| - **FlowMo**: Flow-Momentum World Model, the proposed drift-aware world model for surface vehicles. |
|
|
| Category A learned WM comparisons: |
|
|
| - **LeWorldModel**: JEPA-style latent predictor under the shared clean-image protocol. |
| - **PlaNet RSSM**: recurrent state-space world-model baseline under the shared clean-image protocol. |
| - **TD-MPC2 Dynamics**: task-oriented latent dynamics baseline under the shared clean-image protocol. |
|
|
| Purpose of Category A: compare world-model architectures under identical image data, optimizer budget, rollout target, and evaluation protocol. |
|
|
| Category B traditional controllers: |
|
|
| - **PID/LOS controller** |
| - **No-Flow LOS Controller** |
| - **Current-Estimator LOS Controller** |
| - **Oracle-Flow LOS Controller** |
|
|
| Purpose of Category B: compare downstream task behavior against non-neural controllers that do not train a world model. |
|
|
| Baseline details are documented in `BASELINES.md`; the full experiment protocol is documented in `docs/EXPERIMENT_PROTOCOL.md`. |
|
|
| Design principles: |
|
|
| - Shared simulator, datasets, planning utilities, metrics, and visualization live in `shared/`. |
| - Each method has its own directory with `src/`, `checkpoint/`, and `result/`. |
| - Paper artifacts are collected under `reports/`. |
| - Method names should be explicit and readable. Avoid cryptic suffixes in paper-facing file names. |
|
|
| Standard method interface: |
|
|
| ```text |
| src/model.py # build_model(), load_model() |
| src/train.py # train(config) |
| src/predict.py # rollout(model, batch) |
| src/config.py # default_config() |
| ``` |
|
|
| Closed-loop planning for learned world models is implemented once in `evaluate_image_planning.py` so every learned method is evaluated through the same CEM interface. |
|
|
| Traditional controllers use: |
|
|
| ```text |
| src/controller.py or src/mpc.py |
| src/evaluate.py |
| src/config.py |
| ``` |
|
|
| Formal clean-image configuration: |
|
|
| ```text |
| image_size=160 |
| visual_scale=2.5 |
| train=data/paper/train.npz |
| test=data/paper/test.npz |
| flow_families=noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier |
| ``` |
|
|
| Full paper-facing image pipeline: |
|
|
| ```bash |
| python -m experiments.run_paper_image_pipeline |
| ``` |
|
|
| The default command runs the paper configuration end to end: train all learned world models, evaluate long rollout prediction, run FlowMo latent probes, evaluate closed-loop planning against traditional controllers, generate GIFs, and write the final report. Images are rendered online from simulator states, so no separate image-cache preparation step is required. |
| All flow fields are static. Localized flow structures are sampled near task routes so that boat trajectories encounter non-uniform current in the shared train/test/final protocol. |
|
|
| Manual image training: |
|
|
| ```bash |
| python -m experiments.train_image_world_models |
| python -m experiments.evaluate_image_world_models |
| python -m experiments.evaluate_flowmo_latent_probes |
| python -m experiments.evaluate_image_planning --task reach_target --boat twin |
| python -m experiments.summarize_paper_image_results |
| ``` |
|
|