Accelerating Masked Image Generation by Learning Latent Controlled Dynamics
This repository contains the weights for MIGM-Shortcut, a method to accelerate Masked Image Generation Models (MIGMs).
Paper | Project Page | Code
Introduction
Masked Image Generation Models (MIGMs) often suffer from efficiency issues due to multiple steps of bi-directional attention. MIGM-Shortcut addresses this by learning a lightweight model that incorporates both previous features and sampled tokens to regress the average velocity field of feature evolution.
By replacing heavy base model computation with this lightweight shortcut model during inference, the method achieves over 4x acceleration on state-of-the-art models like Lumina-DiMOO for text-to-image generation while maintaining high image quality.
Performance
Applying MIGM-Shortcut to architectures like MaskGIT and Lumina-DiMOO significantly pushes the Pareto frontier of the quality-speed trade-off, maintaining visual fidelity even at aggressive acceleration rates.
Usage
Please refer to the official GitHub repository for environment setup and instructions on how to use the weights for both MaskGIT-Shortcut and DiMOO-Shortcut.
BibTex
@misc{migm-shortcut,
title={Accelerating Masked Image Generation by Learning Latent Controlled Dynamics},
author={Kaiwen Zhu and Quansheng Zeng and Yuandong Pu and Shuo Cao and Xiaohui Li and Yi Xin and Qi Qin and Jiayang Li and Yu Qiao and Jinjin Gu and Yihao Liu},
year={2026},
eprint={2602.23996},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.23996}
}
Model tree for Kaiwen-Zhu/MIGM-Shortcut
Base model
Alpha-VLLM/Lumina-DiMOO