TeleopWM

Aws Khalil, Jaerock Kwon
Bio-Inspired Machine Intelligence (BIMI) Lab
University of Michigan–Dearborn

TeleopWM is a lightweight predictive latent world model for latency-resilient vision-based teleoperation. Given recent RGB observations and teleoperation control history, it predicts short-horizon future visual observations and future longitudinal/steering trends for predictive display.

TeleopWM is designed for short-horizon predictive display and future action forecasting under teleoperation latency while maintaining lightweight real-time inference characteristics.

Paper and Project Links

Project page: https://bimilab.github.io/paper-TeleopWM/
GitHub repository: https://github.com/bimilab/paper-TeleopWM
Paper: TBD
Dataset: https://huggingface.co/datasets/bimilab/TeleopWM-Dataset
YouTube demo: https://youtu.be/WeKqqZuwBl0

Model Description

TeleopWM predicts 8 future RGB frames and future longitudinal/steering trends from recent visual observations and teleoperation control history. The model uses a SimVP visual backbone together with a TeleopWM latent dynamics branch, and is designed for real-time predictive display under teleoperation latency.

The checkpoint was trained and evaluated on CARLA/MILE-style driving rollouts. TeleopWM is intended as a compact research model for short-horizon predictive continuity, not as an open-ended video generation or autonomous-driving foundation model.

Architecture

TeleopWM combines a SimVP visual backbone with a lightweight latent dynamics module and a motion-aware future action prediction head. The model jointly predicts future visual observations and future driving actions within a unified predictive framework designed for latency-resilient teleoperation.

Intended Use

Research on latency-resilient vision-based teleoperation
Predictive display under communication latency
Short-horizon future observation prediction
Future action trend prediction
CARLA/MILE-style driving rollout analysis

Out-of-Scope Use

Safety-critical autonomous driving deployment without validation
Open-ended video generation
Direct real-vehicle deployment without additional testing
General-purpose world modeling outside the evaluated driving domain

Files

best.pt — final TeleopWM paper checkpoint
config.json — training/evaluation configuration associated with the checkpoint
benchmark.json — runtime benchmark summary
future_action_eval.png — future action evaluation figure
main_rollout_action_figure_final.png — qualitative rollout/action alignment figure

Results Summary

Category	Metric	Value
Rollout prediction	Horizon	8 frames / approximately 533 ms at 15 FPS
Future action prediction	Outputs	longitudinal and steering trends
Runtime	Inference latency	38.9 ms / rollout
Runtime	Prediction rate	205.5 FPS
Runtime	Peak VRAM	1.24 GB
Resolution	Input/output	320x512

Runtime values are reference measurements from the final paper configuration and should be re-measured on target hardware.

Qualitative Rollout Example

Representative 8-step future RGB rollouts and action alignment across straight, mild-turn, sharp-turn, and intersection scenarios.

Future Action Prediction

Per-step future action error and correlation for longitudinal and steering predictions.

Usage

Download the checkpoint and config:

huggingface-cli download bimilab/TeleopWM \
  best.pt config.json \
  --local-dir checkpoints/TeleopWM

Then evaluate using the TeleopWM repository:

python scripts/evaluate_teleopwm.py \
  --checkpoint checkpoints/TeleopWM/best.pt \
  --data-root /path/to/mile_action_diverse/test/Town05 \
  --split test \
  --sample-strategy uniform \
  --max-samples 64 \
  --device cuda

Runtime benchmarking:

python scripts/benchmark_teleopwm.py \
  --checkpoint checkpoints/TeleopWM/best.pt \
  --device cuda \
  --batch-size 1 \
  --warmup 20 \
  --iters 200

Citation

If you use TeleopWM, please cite:

@misc{khalil2026teleopwm,
  title={TeleopWM: A Real-Time Predictive World Model for Latency-Resilient Vision-Based Teleoperation},
  author={Khalil, Aws and Kwon, Jaerock},
  year={2026},
  note={ResearchGate Preprint},
  doi={10.13140/RG.2.2.15259.84002}
}

License

This model is released under the MIT License.

Downloads last month: 1

Video Preview

Robotics

bimilab
/

TeleopWM