FlashWAM-RoboTwin / README.md
armanakbari4's picture
Add model card
d26c24e verified
---
license: apache-2.0
library_name: diffusers
tags:
- robotics
- world-model
- diffusion
- step-distillation
- lingbot-va
pipeline_tag: robotics
---
# Flash-WAM β€” RoboTwin (distilled)
Single-step distilled checkpoint for **Flash-WAM: Modality-Aware Distillation for World Action Models**, applied to LingBot-VA and evaluated on RoboTwin 2.0. Flash-WAM distills each modality with a consistency function matched to its noise regime (linear-gradient-scaling for the action stream, variance-preserving for the video stream), compressing inference to a single step per modality for up to a **23Γ— speedup** while preserving teacher-level task success.
This repository contains the **complete model** (distilled transformer + encoders):
| Component | Description |
| :--- | :--- |
| `transformer/` | Distilled Flash-WAM student |
| `vae/` | VAE (from the LingBot-VA teacher) |
| `text_encoder/` | UMT5-XXL text encoder (from the teacher) |
| `tokenizer/` | T5 tokenizer |
## Links
- πŸ“„ Paper: https://arxiv.org/abs/2606.05254
- 🌐 Project page: https://flashwam.github.io
- πŸ’» Code: https://github.com/NU-World-Model-Embodied-AI/Flash-WAM
## Usage
For environment setup and evaluation, follow the [Flash-WAM repository](https://github.com/NU-World-Model-Embodied-AI/Flash-WAM) and [LingBot-VA](https://github.com/Robbyant/lingbot-va). Point the inference server at this checkpoint directory.
## Citation
```bibtex
@misc{akbari2026flashwammodalityawaredistillationworld,
title={Flash-WAM: Modality-Aware Distillation for World Action Models},
author={Arman Akbari and Ci Zhang and Arash Akbari and Lin Zhao and Yixiao Chen and Weiwei Chen and Xuan Zhang and Geng Yuan and Yanzhi Wang},
year={2026},
eprint={2606.05254},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2606.05254},
}
```
License: Apache-2.0.