Add model card

d26c24e verified 4 days ago

1.89 kB

license: apache-2.0
library_name: diffusers
tags:
  - robotics
  - world-model
  - diffusion
  - step-distillation
  - lingbot-va
pipeline_tag: robotics

Flash-WAM — RoboTwin (distilled)

Single-step distilled checkpoint for Flash-WAM: Modality-Aware Distillation for World Action Models, applied to LingBot-VA and evaluated on RoboTwin 2.0. Flash-WAM distills each modality with a consistency function matched to its noise regime (linear-gradient-scaling for the action stream, variance-preserving for the video stream), compressing inference to a single step per modality for up to a 23× speedup while preserving teacher-level task success.

This repository contains the complete model (distilled transformer + encoders):

Component	Description
`transformer/`	Distilled Flash-WAM student
`vae/`	VAE (from the LingBot-VA teacher)
`text_encoder/`	UMT5-XXL text encoder (from the teacher)
`tokenizer/`	T5 tokenizer

Usage

For environment setup and evaluation, follow the Flash-WAM repository and LingBot-VA. Point the inference server at this checkpoint directory.

Citation

@misc{akbari2026flashwammodalityawaredistillationworld,
      title={Flash-WAM: Modality-Aware Distillation for World Action Models}, 
      author={Arman Akbari and Ci Zhang and Arash Akbari and Lin Zhao and Yixiao Chen and Weiwei Chen and Xuan Zhang and Geng Yuan and Yanzhi Wang},
      year={2026},
      eprint={2606.05254},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2606.05254}, 
}

License: Apache-2.0.

NU-World-Model-Embodied-AI
/

FlashWAM-RoboTwin

Flash-WAM — RoboTwin (distilled)

Links

Usage

Citation