TeleopWM
Aws Khalil, Jaerock Kwon
Bio-Inspired Machine Intelligence (BIMI) Lab
University of Michigan–Dearborn
TeleopWM is a lightweight predictive latent world model for latency-resilient vision-based teleoperation. Given recent RGB observations and teleoperation control history, it predicts short-horizon future visual observations and future longitudinal/steering trends for predictive display.
TeleopWM is designed for short-horizon predictive display and future action forecasting under teleoperation latency while maintaining lightweight real-time inference characteristics.
Paper and Project Links
- Project page: https://bimilab.github.io/paper-TeleopWM/
- GitHub repository: https://github.com/bimilab/paper-TeleopWM
- Paper: TBD
- Dataset: https://huggingface.co/datasets/bimilab/TeleopWM-Dataset
- YouTube demo: https://youtu.be/WeKqqZuwBl0
Model Description
TeleopWM predicts 8 future RGB frames and future longitudinal/steering trends from recent visual observations and teleoperation control history. The model uses a SimVP visual backbone together with a TeleopWM latent dynamics branch, and is designed for real-time predictive display under teleoperation latency.
The checkpoint was trained and evaluated on CARLA/MILE-style driving rollouts. TeleopWM is intended as a compact research model for short-horizon predictive continuity, not as an open-ended video generation or autonomous-driving foundation model.
Architecture
TeleopWM combines a SimVP visual backbone with a lightweight latent dynamics module and a motion-aware future action prediction head. The model jointly predicts future visual observations and future driving actions within a unified predictive framework designed for latency-resilient teleoperation.
Intended Use
- Research on latency-resilient vision-based teleoperation
- Predictive display under communication latency
- Short-horizon future observation prediction
- Future action trend prediction
- CARLA/MILE-style driving rollout analysis
Out-of-Scope Use
- Safety-critical autonomous driving deployment without validation
- Open-ended video generation
- Direct real-vehicle deployment without additional testing
- General-purpose world modeling outside the evaluated driving domain
Files
best.pt— final TeleopWM paper checkpointconfig.json— training/evaluation configuration associated with the checkpointbenchmark.json— runtime benchmark summaryfuture_action_eval.png— future action evaluation figuremain_rollout_action_figure_final.png— qualitative rollout/action alignment figure
Results Summary
| Category | Metric | Value |
|---|---|---|
| Rollout prediction | Horizon | 8 frames / approximately 533 ms at 15 FPS |
| Future action prediction | Outputs | longitudinal and steering trends |
| Runtime | Inference latency | 38.9 ms / rollout |
| Runtime | Prediction rate | 205.5 FPS |
| Runtime | Peak VRAM | 1.24 GB |
| Resolution | Input/output | 320x512 |
Runtime values are reference measurements from the final paper configuration and should be re-measured on target hardware.
Qualitative Rollout Example
Representative 8-step future RGB rollouts and action alignment across straight, mild-turn, sharp-turn, and intersection scenarios.
Future Action Prediction
Per-step future action error and correlation for longitudinal and steering predictions.
Usage
Download the checkpoint and config:
huggingface-cli download bimilab/TeleopWM \
best.pt config.json \
--local-dir checkpoints/TeleopWM
Then evaluate using the TeleopWM repository:
python scripts/evaluate_teleopwm.py \
--checkpoint checkpoints/TeleopWM/best.pt \
--data-root /path/to/mile_action_diverse/test/Town05 \
--split test \
--sample-strategy uniform \
--max-samples 64 \
--device cuda
Runtime benchmarking:
python scripts/benchmark_teleopwm.py \
--checkpoint checkpoints/TeleopWM/best.pt \
--device cuda \
--batch-size 1 \
--warmup 20 \
--iters 200
Citation
If you use TeleopWM, please cite:
@misc{teleopwm2026,
title={TeleopWM: A Real-Time Predictive World Model for Latency-Resilient Vision-Based Teleoperation},
author={Khalil, Aws and Kwon, Jaerock},
year={2026},
note={Under review}
}
License
This model is released under the MIT License.
- Downloads last month
- -


