Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

This repository contains the weights for MIGM-Shortcut, a method to accelerate Masked Image Generation Models (MIGMs).

Paper | Project Page | Code

Introduction

Masked Image Generation Models (MIGMs) often suffer from efficiency issues due to multiple steps of bi-directional attention. MIGM-Shortcut addresses this by learning a lightweight model that incorporates both previous features and sampled tokens to regress the average velocity field of feature evolution.

By replacing heavy base model computation with this lightweight shortcut model during inference, the method achieves over 4x acceleration on state-of-the-art models like Lumina-DiMOO for text-to-image generation while maintaining high image quality.

Performance

Applying MIGM-Shortcut to architectures like MaskGIT and Lumina-DiMOO significantly pushes the Pareto frontier of the quality-speed trade-off, maintaining visual fidelity even at aggressive acceleration rates.

Usage

Please refer to the official GitHub repository for environment setup and instructions on how to use the weights for both MaskGIT-Shortcut and DiMOO-Shortcut.

BibTex

@misc{migm-shortcut,
  title={Accelerating Masked Image Generation by Learning Latent Controlled Dynamics}, 
  author={Kaiwen Zhu and Quansheng Zeng and Yuandong Pu and Shuo Cao and Xiaohui Li and Yi Xin and Qi Qin and Jiayang Li and Yu Qiao and Jinjin Gu and Yihao Liu},
  year={2026},
  eprint={2602.23996},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2602.23996}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kaiwen-Zhu/MIGM-Shortcut

Finetuned
(1)
this model

Papers for Kaiwen-Zhu/MIGM-Shortcut