Instructions to use NU-World-Model-Embodied-AI/FlashWAM-RoboTwin with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use NU-World-Model-Embodied-AI/FlashWAM-RoboTwin with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("NU-World-Model-Embodied-AI/FlashWAM-RoboTwin", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: diffusers | |
| tags: | |
| - robotics | |
| - world-model | |
| - diffusion | |
| - step-distillation | |
| - lingbot-va | |
| pipeline_tag: robotics | |
| # Flash-WAM β RoboTwin (distilled) | |
| Single-step distilled checkpoint for **Flash-WAM: Modality-Aware Distillation for World Action Models**, applied to LingBot-VA and evaluated on RoboTwin 2.0. Flash-WAM distills each modality with a consistency function matched to its noise regime (linear-gradient-scaling for the action stream, variance-preserving for the video stream), compressing inference to a single step per modality for up to a **23Γ speedup** while preserving teacher-level task success. | |
| This repository contains the **complete model** (distilled transformer + encoders): | |
| | Component | Description | | |
| | :--- | :--- | | |
| | `transformer/` | Distilled Flash-WAM student | | |
| | `vae/` | VAE (from the LingBot-VA teacher) | | |
| | `text_encoder/` | UMT5-XXL text encoder (from the teacher) | | |
| | `tokenizer/` | T5 tokenizer | | |
| ## Links | |
| - π Paper: https://arxiv.org/abs/2606.05254 | |
| - π Project page: https://flashwam.github.io | |
| - π» Code: https://github.com/NU-World-Model-Embodied-AI/Flash-WAM | |
| ## Usage | |
| For environment setup and evaluation, follow the [Flash-WAM repository](https://github.com/NU-World-Model-Embodied-AI/Flash-WAM) and [LingBot-VA](https://github.com/Robbyant/lingbot-va). Point the inference server at this checkpoint directory. | |
| ## Citation | |
| ```bibtex | |
| @misc{akbari2026flashwammodalityawaredistillationworld, | |
| title={Flash-WAM: Modality-Aware Distillation for World Action Models}, | |
| author={Arman Akbari and Ci Zhang and Arash Akbari and Lin Zhao and Yixiao Chen and Weiwei Chen and Xuan Zhang and Geng Yuan and Yanzhi Wang}, | |
| year={2026}, | |
| eprint={2606.05254}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG}, | |
| url={https://arxiv.org/abs/2606.05254}, | |
| } | |
| ``` | |
| License: Apache-2.0. | |