JEPA-style image latent predictor under the shared clean-image boat drift benchmark.
This method encodes clean image observations, predicts future latent/pose targets from actions, and uses the same optimizer/data/evaluation budget as FlowMo.