File size: 3,457 Bytes
604e535 8e384df 604e535 8e384df 604e535 cc396fd 604e535 8e384df 604e535 db21b01 604e535 ccf9f1b 604e535 ccf9f1b 604e535 ccf9f1b 604e535 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | # Experiments
This directory contains the paper-facing experiment code, checkpoints, results, GIFs, tables, and reports.
This directory contains two formal experiment categories:
- **A. Learned world models**: trainable image-input WMs evaluated on rollout prediction and WM-based planning.
- **B. Traditional non-WM controllers**: hand-designed control baselines evaluated on the same downstream tasks.
Main method:
- **FlowMo**: Flow-Momentum World Model, the proposed drift-aware world model for surface vehicles.
Category A learned WM comparisons:
- **LeWorldModel**: JEPA-style latent predictor under the shared clean-image protocol.
- **PlaNet RSSM**: recurrent state-space world-model baseline under the shared clean-image protocol.
- **TD-MPC2 Dynamics**: task-oriented latent dynamics baseline under the shared clean-image protocol.
Purpose of Category A: compare world-model architectures under identical image data, optimizer budget, rollout target, and evaluation protocol.
Category B traditional controllers:
- **PID/LOS controller**
- **No-Flow LOS Controller**
- **Current-Estimator LOS Controller**
- **Oracle-Flow LOS Controller**
Purpose of Category B: compare downstream task behavior against non-neural controllers that do not train a world model.
Baseline details are documented in `BASELINES.md`; the full experiment protocol is documented in `docs/EXPERIMENT_PROTOCOL.md`.
Design principles:
- Shared simulator, datasets, planning utilities, metrics, and visualization live in `shared/`.
- Each method has its own directory with `src/`, `checkpoint/`, and `result/`.
- Paper artifacts are collected under `reports/`.
- Method names should be explicit and readable. Avoid cryptic suffixes in paper-facing file names.
Standard method interface:
```text
src/model.py # build_model(), load_model()
src/train.py # train(config)
src/predict.py # rollout(model, batch)
src/config.py # default_config()
```
Closed-loop planning for learned world models is implemented once in `evaluate_image_planning.py` so every learned method is evaluated through the same CEM interface.
Traditional controllers use:
```text
src/controller.py or src/mpc.py
src/evaluate.py
src/config.py
```
Formal clean-image configuration:
```text
image_size=160
visual_scale=2.5
train=data/paper/train.npz
test=data/paper/test.npz
flow_families=noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier
```
Full paper-facing image pipeline:
```bash
python -m experiments.run_paper_image_pipeline
```
The default command runs the paper configuration end to end: train all learned world models, evaluate long rollout prediction, run FlowMo latent probes, evaluate closed-loop planning against traditional controllers, generate GIFs, and write the final report. Images are rendered online from simulator states, so no separate image-cache preparation step is required.
All flow fields are static. Localized flow structures are sampled near task routes so that boat trajectories encounter non-uniform current in the shared train/test/final protocol.
Manual image training:
```bash
python -m experiments.train_image_world_models
python -m experiments.evaluate_image_world_models
python -m experiments.evaluate_flowmo_latent_probes
python -m experiments.evaluate_image_planning --task reach_target --boat twin
python -m experiments.summarize_paper_image_results
```
|