license: apache-2.0
pipeline_tag: robotics
tags:
- robotics
- world-model
- video-generation
- transformer
LingBot-VA: Causal World Modeling for Robot Control
LingBot-VA is an autoregressive diffusion framework designed for simultaneous world modeling and robot action execution. By understanding the causality between actions and visual dynamics, the model provides the ability to imagine the near future and plan actions accordingly.
- Autoregressive Video-Action World Modeling: Architecturally unifies visual dynamics prediction and action inference within a single interleaved sequence.
- High-efficiency Execution: Uses a dual-stream Mixture-of-Transformers (MoT) architecture with Asynchronous Execution and KV Cache support.
- Long-Horizon Performance: Demonstrates significant promise in long-horizon manipulation and strong generalizability to novel configurations.
Model Sources
- Repository: https://github.com/robbyant/lingbot-va
- Paper: Causal World Modeling for Robot Control
- Project Page: https://technology.robbyant.com/lingbot-va
π οΈ Quick Start
Installation
pip install torch==2.9.0 torchvision==0.24.0 torchaudio==2.9.0 --index-url https://download.pytorch.org/whl/cu126
pip install websockets einops diffusers==0.36.0 transformers==5.0.0 accelerate msgpack opencv-python matplotlib ftfy easydict
pip install flash-attn --no-build-isolation
Run Image to Video-Action Generation
You can use the following command to generate video-action sequences from images:
NGPU=1 CONFIG_NAME='robotwin_i2av' bash script/run_launch_va_server_sync.sh
π Performance
LingBot-VA achieves state-of-the-art performance on benchmarks like RoboTwin 2.0 and LIBERO, specifically excelling in long-horizon tasks and sample efficiency. For detailed evaluation results on simulation and real-world scenarios, please refer to the paper or the GitHub README.
π Citation
@article{lingbot-va2026,
title={Causal World Modeling for Robot Control},
author={Li, Lin and Zhang, Qihang and Luo, Yiming and Yang, Shuai and Wang, Ruilin and Han, Fei and Yu, Mingrui and Gao, Zelin and Xue, Nan and Zhu, Xing and Shen, Yujun and Xu, Yinghao},
journal={arXiv preprint arXiv:2601.21998},
year={2026}
}
πͺͺ License
This project is released under the Apache License 2.0.
π§© Acknowledgments
This work builds upon several excellent open-source projects: