| license: apache-2.0 | |
| pipeline_tag: image-to-image | |
| # Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization | |
| This repository contains **FlowMo**, a transformer-based diffusion autoencoder that achieves state-of-the-art performance for image tokenization at multiple compression rates. It is introduced in the paper [Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization](https://huggingface.co/papers/2503.11056). | |
| FlowMo operates without using convolutions, adversarial losses, spatially-aligned two-dimensional latent codes, or distilling from other tokenizers. Its key insight is that training should be broken into a mode-matching pre-training stage and a mode-seeking post-training stage. | |
| <p align="center"> | |
| <img src="https://github.com/kylesargent/FlowMo/raw/main/demo.gif" alt="FlowMo demo GIF" /> | |
| </p> | |
| ## Links | |
| * **Project Page:** [https://kylesargent.github.io/flowmo](https://kylesargent.github.io/flowmo) | |
| * **Code Repository:** [https://github.com/kylesargent/FlowMo](https://github.com/kylesargent/FlowMo) | |
| ## Usage | |
| The official GitHub repository provides comprehensive instructions for installation, data preparation, training, and evaluation. A Jupyter notebook, `example.ipynb`, is available to demonstrate how to use the FlowMo tokenizer for image reconstruction. | |
| ## Citation | |
| If you find FlowMo useful, please cite our paper: | |
| ```bibtex | |
| @misc{sargent2025flowmodemodeseekingdiffusion, | |
| title={Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization}, | |
| author={Kyle Sargent and Kyle Hsu and Justin Johnson and Li Fei-Fei and Jiajun Wu}, | |
| year={2025}, | |
| eprint={2503.11056}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CV}, | |
| url={https://arxiv.org/abs/2503.11056}, | |
| } | |
| ``` |