--- license: apache-2.0 library_name: diffusers tags: - robotics - world-model - diffusion - step-distillation - lingbot-va pipeline_tag: robotics --- # Flash-WAM — RoboTwin (distilled) Single-step distilled checkpoint for **Flash-WAM: Modality-Aware Distillation for World Action Models**, applied to LingBot-VA and evaluated on RoboTwin 2.0. Flash-WAM distills each modality with a consistency function matched to its noise regime (linear-gradient-scaling for the action stream, variance-preserving for the video stream), compressing inference to a single step per modality for up to a **23× speedup** while preserving teacher-level task success. This repository contains the **complete model** (distilled transformer + encoders): | Component | Description | | :--- | :--- | | `transformer/` | Distilled Flash-WAM student | | `vae/` | VAE (from the LingBot-VA teacher) | | `text_encoder/` | UMT5-XXL text encoder (from the teacher) | | `tokenizer/` | T5 tokenizer | ## Links - 📄 Paper: https://arxiv.org/abs/2606.05254 - 🌐 Project page: https://flashwam.github.io - 💻 Code: https://github.com/NU-World-Model-Embodied-AI/Flash-WAM ## Usage For environment setup and evaluation, follow the [Flash-WAM repository](https://github.com/NU-World-Model-Embodied-AI/Flash-WAM) and [LingBot-VA](https://github.com/Robbyant/lingbot-va). Point the inference server at this checkpoint directory. ## Citation ```bibtex @misc{akbari2026flashwammodalityawaredistillationworld, title={Flash-WAM: Modality-Aware Distillation for World Action Models}, author={Arman Akbari and Ci Zhang and Arash Akbari and Lin Zhao and Yixiao Chen and Weiwei Chen and Xuan Zhang and Geng Yuan and Yanzhi Wang}, year={2026}, eprint={2606.05254}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2606.05254}, } ``` License: Apache-2.0.