FlowMo / README.md

nielsr HF Staff

Improve model card: Add pipeline tag, links, and usage reference

1523101 verified 6 months ago

1.85 kB

	---
	license: apache-2.0
	pipeline_tag: image-to-image
	---

	# Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

	This repository contains FlowMo, a transformer-based diffusion autoencoder that achieves state-of-the-art performance for image tokenization at multiple compression rates. It is introduced in the paper [Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization](https://huggingface.co/papers/2503.11056).

	FlowMo operates without using convolutions, adversarial losses, spatially-aligned two-dimensional latent codes, or distilling from other tokenizers. Its key insight is that training should be broken into a mode-matching pre-training stage and a mode-seeking post-training stage.

	<p align="center">
	<img src="https://github.com/kylesargent/FlowMo/raw/main/demo.gif" alt="FlowMo demo GIF" />
	</p>

	## Links

	* Project Page: [https://kylesargent.github.io/flowmo](https://kylesargent.github.io/flowmo)
	* Code Repository: [https://github.com/kylesargent/FlowMo](https://github.com/kylesargent/FlowMo)

	## Usage

	The official GitHub repository provides comprehensive instructions for installation, data preparation, training, and evaluation. A Jupyter notebook, `example.ipynb`, is available to demonstrate how to use the FlowMo tokenizer for image reconstruction.

	## Citation

	If you find FlowMo useful, please cite our paper:

	```bibtex
	@misc{sargent2025flowmodemodeseekingdiffusion,
	title={Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization},
	author={Kyle Sargent and Kyle Hsu and Justin Johnson and Li Fei-Fei and Jiajun Wu},
	year={2025},
	eprint={2503.11056},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2503.11056},
	}
	```