Rename traditional baseline artifacts as LOS controllers

cc396fd verified 9 days ago

4.6 kB

	# Paper Task Plan

	This is the execution plan for the public FlowMo experiments. The plan has two parts: A evaluates world models directly, and B evaluates downstream control behavior with traditional non-WM references.

	## A. Learned World Models

	Purpose: test whether the FlowMo world-model architecture improves image-based prediction under hidden flow, boat momentum, actuator delay, and drag.

	Shared setup:

	```text
	Input: clean top-down boat images plus action history
	No image cues: no flow arrows, no velocity vector, no goal marker
	Training data: data/paper/train.npz
	Evaluation data: data/paper/test.npz
	Flow families: noflow, uniform, vortex_center, double_gyre, source_sink, source_sink_pair, gradient, shear, turbulent_patch, random_fourier
	All flow fields are static. Localized flow structures are sampled near common
	task routes so the boat encounters non-uniform current during rollout.
	Training budget: shared optimizer, batch size, rollout horizon, step count, and checkpoint schedule
	Training precision: BF16 model autocast, FP32 losses and metrics
	Prediction precision: BF16 model autocast, FP32 metrics
	```

	Compared methods:

	\| Method \| Purpose \|
	\|---\|---\|
	\| `flowmo` \| Proposed flow-momentum WM. Tests explicit separation of short object-motion state and long ambient-drift context. \|
	\| `leworldmodel` \| JEPA-style latent predictor. Tests whether a simple current-image latent transition is sufficient. \|
	\| `planet` \| RSSM recurrent state-space WM. Tests whether generic recurrent latent memory can absorb momentum and flow effects without FlowMo's explicit context. \|
	\| `tdmpc2` \| Compact latent-dynamics WM. Tests whether a task-oriented latent transition architecture matches FlowMo under the same rollout supervision. \|

	Primary A metrics:

	```text
	pos@1, pos@5, pos@10, pos@20, pos@40, pos@60
	heading@20, heading@60
	zero-action drift prediction error
	no-flow momentum decay prediction error
	same-action different-flow prediction error
	FlowMo inferred-context vs c=0 vs shuffled-context error
	```

	Required A outputs:

	```text
	experiments/<method>/checkpoint/paper.pt
	experiments/<method>/checkpoint/paper_step_*.pt
	experiments/<method>/result/parameter_count.json
	experiments/<method>/result/paper_training.json
	experiments/reports/paper_prediction.json
	experiments/reports/paper_flowmo_latent_probes.json
	```

	Core A conclusions:

	```text
	1. Whether FlowMo has lower long-horizon rollout error.
	2. Whether the gain holds across the full paper flow-family set.
	3. Whether explicit drift context helps beyond ordinary recurrent history.
	4. Whether the same architecture works for both twin and triangle boats.
	5. Whether frozen linear probes recover object momentum from `z_t` and ambient drift from `c_t`.
	```

	## B. Traditional Non-WM Controllers

	Purpose: evaluate downstream control behavior and provide non-neural-control reference points. These methods do not train a world model.

	Shared setup:

	```text
	Input: clean top-down images converted to pose for classical control
	Tasks: same simulator, same boats, same goals, same flow settings
	Metrics: success, final distance, successful-episode trajectory length, successful-episode thrust energy, successful-episode time-to-goal
	Planning precision: FP32
	```

	Compared methods:

	\| Method \| Purpose \|
	\|---\|---\|
	\| `pid_los_controller` \| Classical line-of-sight waypoint tracking. Tests a simple hand-designed controller. \|
	\| `no_flow_los_controller` \| No-flow LOS controller without external-current compensation. Tests how much hidden flow hurts a nominal dynamics controller. \|
	\| `current_estimator_los_controller` \| LOS controller with recent-drift current estimation. Tests a strong classical current-compensation baseline. \|
	\| `oracle_flow_los_controller` \| LOS controller with simulator true local flow feed-forward. Tests whether local flow feed-forward helps a simple geometric controller; it is not a full dynamics-MPC upper bound. \|

	Planning tasks:

	```text
	reach_target
	station_keeping
	waypoint_square
	waypoint_zigzag
	```

	Boats:

	```text
	twin
	triangle
	```

	Flow families:

	```text
	noflow
	uniform
	vortex_center
	double_gyre
	source_sink
	source_sink_pair
	gradient
	shear
	turbulent_patch
	random_fourier
	```

	Required B outputs:

	```text
	experiments/reports/paper_planning/*.json
	experiments/reports/paper_planning/gifs/*.gif
	```

	Core B conclusions:

	```text
	1. Whether WM-based planning is competitive with classical non-WM control.
	2. Whether FlowMo improves success, final distance, energy, and path length versus other learned WMs.
	3. How far FlowMo remains from the oracle-flow classical reference.
	```