--- license: apache-2.0 datasets: - CSU-JPG/VisPrompt5M - CSU-JPG/VPBench language: - en metrics: - code_eval pipeline_tag: image-to-image ---

FlowInOne: Unifying Multimodal Generation as Image-in, Image-out Flow Matching

TL;DR: The first vision-centric image-in, image-out image generation model.

## About We present FlowInOne, a framework that reformulates multimodal generation as a **purely visual flow**, converting all inputs into visual prompts and enabling a clean **image-in, image-out** pipeline governed by a single flow matching model. This vision-centric formulation naturally eliminates cross-modal alignment bottlenecks, noise scheduling, and task-specific architectural branches, **unifying text-to-image generation, layout-guided editing, and visual instruction following under one coherent paradigm**. Extensive experiments demonstrate that FlowInOne achieves **state-of-the-art performance across all unified generation tasks**, surpassing both open-source models and competitive commercial systems, establishing a new foundation for fully vision-centric generative modeling where perception and creation coexist within a single continuous visual space. ## 🧪 Usage you can download the model weights and model preparation ```bash # model weights wget -O /path/to/download https://huggingface.co/CSU-JPG/FlowInOne/resolve/main/flowinone_256px.pth # model preparation wget -O /path/to/download https://huggingface.co/CSU-JPG/FlowInOne/resolve/main/preparation.tar.gz # unzip tar -xzvf "preparation.tar.gz" -C "/path/to/preparation" ``` you can download the dataset examples ```bash wget -O /path/to/download https://huggingface.co/CSU-JPG/FlowInOne/resolve/main/flowinone_demo_dataset.tar.gz # unzip tar -xzvf "flowinone_demo_dataset.tar.gz" -C "/path/to/flowinone_demo_dataset" ``` Our training and inference scripts are now available on [GitHub](https://github.com/CSU-JPG/FlowInOne)! ## Citation If you found our work useful, please consider citing: ``` @article{yi2026flowinoneunifyingmultimodalgenerationimagein, title={FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching}, author={Junchao Yi and Rui Zhao and Jiahao Tang and Weixian Lei and Linjie Li and Qisheng Su and Zhengyuan Yang and Lijuan Wang and Xiaofeng Zhu and Alex Jinpeng Wang}, journal={arXiv preprint arXiv:2604.06757}, year={2026} } ```