FlowMo / README.md
nielsr's picture
nielsr HF Staff
Improve model card: Add pipeline tag, links, and usage reference
1523101 verified
|
raw
history blame
1.85 kB
metadata
license: apache-2.0
pipeline_tag: image-to-image

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

This repository contains FlowMo, a transformer-based diffusion autoencoder that achieves state-of-the-art performance for image tokenization at multiple compression rates. It is introduced in the paper Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization.

FlowMo operates without using convolutions, adversarial losses, spatially-aligned two-dimensional latent codes, or distilling from other tokenizers. Its key insight is that training should be broken into a mode-matching pre-training stage and a mode-seeking post-training stage.

FlowMo demo GIF

Links

Usage

The official GitHub repository provides comprehensive instructions for installation, data preparation, training, and evaluation. A Jupyter notebook, example.ipynb, is available to demonstrate how to use the FlowMo tokenizer for image reconstruction.

Citation

If you find FlowMo useful, please cite our paper:

@misc{sargent2025flowmodemodeseekingdiffusion,
      title={Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization},
      author={Kyle Sargent and Kyle Hsu and Justin Johnson and Li Fei-Fei and Jiajun Wu},
      year={2025},
      eprint={2503.11056},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.11056},
}