Image-to-Image
English

Improve model card and metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +43 -20
README.md CHANGED
@@ -1,18 +1,22 @@
1
  ---
2
- license: apache-2.0
3
  datasets:
4
  - CSU-JPG/VisPrompt5M
5
  - CSU-JPG/VPBench
6
  language:
7
  - en
8
- metrics:
9
- - code_eval
10
  pipeline_tag: image-to-image
 
 
 
 
 
11
  ---
 
12
  <div align="center">
13
  <h2 align="center" style="margin-top: 0; margin-bottom: 15px;">
14
- <span style="color:#0052CC">F</span><span style="color:#135FD0">l</span><span style="color:#266CD4">o</span><span style="color:#3979D7">w</span><span style="color:#4C86DB">I</span><span style="color:#6093DF">n</span><span style="color:#73A0E3">O</span><span style="color:#86ADE7">n</span><span style="color:#99BAEB">e</span>: Unifying Multimodal Generation as
15
- <span style="color:#0052CC">I</span><span style="color:#0958CE">m</span><span style="color:#125ED0">a</span><span style="color:#1B64D2">g</span><span style="color:#246AD4">e</span><span style="color:#2D70D6">-</span><span style="color:#3676D8">i</span><span style="color:#3F7CDA">n</span><span style="color:#4882DC">,</span>&nbsp;<span style="color:#5188DE">I</span><span style="color:#5A8EE0">m</span><span style="color:#6394E2">a</span><span style="color:#6C9AE4">g</span><span style="color:#75A0E6">e</span><span style="color:#7EA6E8">-</span><span style="color:#87ACEA">o</span><span style="color:#90B2EC">u</span><span style="color:#99B8EE">t</span> Flow Matching
16
  </h2>
17
  <p align="center" style="font-size: 15px;">
18
  <span style="color:#E74C3C; font-weight: bold;">TL;DR:</span> <strong>The first vision-centric image-in, image-out image generation model.</strong>
@@ -20,40 +24,59 @@ pipeline_tag: image-to-image
20
  <p align="center" style="font-size: 16px;">
21
  <a href="https://csu-jpg.github.io/FlowInOne.github.io/" style="text-decoration: none;">🌐 Homepage</a> |
22
  <a href="https://github.com/CSU-JPG/FlowInOne" style="text-decoration: none;">πŸ’» Code</a> |
23
- <a href="https://arxiv.org/pdf/2604.06757" style="text-decoration: none;">πŸ“„ Paper</a> |
24
  <a href="https://huggingface.co/datasets/CSU-JPG/VisPrompt5M" style="text-decoration: none;">πŸ“ Dataset</a> |
25
  <a href="https://huggingface.co/datasets/CSU-JPG/VPBench" style="text-decoration: none;">🌏 Benchmark</a> |
26
  <a href="https://huggingface.co/CSU-JPG/FlowInOne" style="text-decoration: none;">πŸ€— Model</a>
27
  </p>
28
  </div>
29
 
 
 
 
30
  ## About
31
- We present FlowInOne, a framework that reformulates multimodal generation as a **purely visual flow**, converting all inputs into visual prompts and enabling a clean **image-in, image-out** pipeline governed by a single flow matching model.
 
32
  This vision-centric formulation naturally eliminates cross-modal alignment bottlenecks, noise scheduling, and task-specific architectural branches, **unifying text-to-image generation, layout-guided editing, and visual instruction following under one coherent paradigm**.
33
- Extensive experiments demonstrate that FlowInOne achieves **state-of-the-art performance across all unified generation tasks**, surpassing both open-source models and competitive commercial systems, establishing a new foundation for fully vision-centric generative modeling where perception and creation coexist within a single continuous visual space.
34
 
35
- ## πŸ§ͺ Usage
36
- you can download the model weights and model preparation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ```bash
38
  # model weights
39
- wget -O /path/to/download https://huggingface.co/CSU-JPG/FlowInOne/resolve/main/flowinone_256px.pth
 
40
  # model preparation
41
- wget -O /path/to/download https://huggingface.co/CSU-JPG/FlowInOne/resolve/main/preparation.tar.gz
42
- # unzip
43
- tar -xzvf "preparation.tar.gz" -C "/path/to/preparation"
44
  ```
45
- you can download the dataset examples
 
 
46
  ```bash
47
- wget -O /path/to/download https://huggingface.co/CSU-JPG/FlowInOne/resolve/main/flowinone_demo_dataset.tar.gz
48
- # unzip
49
- tar -xzvf "flowinone_demo_dataset.tar.gz" -C "/path/to/flowinone_demo_dataset"
50
  ```
51
- Our training and inference scripts are now available on [GitHub](https://github.com/CSU-JPG/FlowInOne)!
 
52
 
53
  ## Citation
54
 
55
  If you found our work useful, please consider citing:
56
- ```
57
  @article{yi2026flowinoneunifyingmultimodalgenerationimagein,
58
  title={FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching},
59
  author={Junchao Yi and Rui Zhao and Jiahao Tang and Weixian Lei and Linjie Li and Qisheng Su and Zhengyuan Yang and Lijuan Wang and Xiaofeng Zhu and Alex Jinpeng Wang},
 
1
  ---
 
2
  datasets:
3
  - CSU-JPG/VisPrompt5M
4
  - CSU-JPG/VPBench
5
  language:
6
  - en
7
+ license: apache-2.0
 
8
  pipeline_tag: image-to-image
9
+ tags:
10
+ - flow-matching
11
+ - image-generation
12
+ - image-editing
13
+ - vision-centric
14
  ---
15
+
16
  <div align="center">
17
  <h2 align="center" style="margin-top: 0; margin-bottom: 15px;">
18
+ <span style="color:#0052CC">F</span><span style="color:#135FD0">l</span><span style="color:#266CD4">o</span><span style="color:#3979D7">w</span><span style="color:#4C86DB">I</span><span style="color:#6093DF">n</span><span style="color:#73A0E3">O</span><span style="color:#86ADE7">n</span><span style="color:#99BAEB\">e</span>: Unifying Multimodal Generation as
19
+ <span style="color:#0052CC">I</span><span style="color:#0958CE">m</span><span style="color:#125ED0">a</span><span style="color:#1B64D2">g</span><span style="color:#246AD4">e</span><span style="color:#2D70D6">-</span><span style="color:#3676D8">i</span><span style="color:#3F7CDA\">n</span><span style="color:#4882DC">,</span>&nbsp;<span style="color:#5188DE">I</span><span style="color:#5A8EE0\">m</span><span style="color:#6394E2\">a</span><span style="color:#6C9AE4\">g</span><span style="color:#75A0E6\">e</span><span style="color:#7EA6E8">-</span><span style="color:#87ACEA\">o</span><span style="color:#90B2EC\">u</span><span style="color:#99B8EE\">t</span> Flow Matching
20
  </h2>
21
  <p align="center" style="font-size: 15px;">
22
  <span style="color:#E74C3C; font-weight: bold;">TL;DR:</span> <strong>The first vision-centric image-in, image-out image generation model.</strong>
 
24
  <p align="center" style="font-size: 16px;">
25
  <a href="https://csu-jpg.github.io/FlowInOne.github.io/" style="text-decoration: none;">🌐 Homepage</a> |
26
  <a href="https://github.com/CSU-JPG/FlowInOne" style="text-decoration: none;">πŸ’» Code</a> |
27
+ <a href="https://huggingface.co/papers/2604.06757" style="text-decoration: none;">πŸ“„ Paper</a> |
28
  <a href="https://huggingface.co/datasets/CSU-JPG/VisPrompt5M" style="text-decoration: none;">πŸ“ Dataset</a> |
29
  <a href="https://huggingface.co/datasets/CSU-JPG/VPBench" style="text-decoration: none;">🌏 Benchmark</a> |
30
  <a href="https://huggingface.co/CSU-JPG/FlowInOne" style="text-decoration: none;">πŸ€— Model</a>
31
  </p>
32
  </div>
33
 
34
+ ## Authors
35
+ Junchao Yi, Rui Zhao, Jiahao Tang, Weixian Lei, Linjie Li, Qisheng Su, Zhengyuan Yang, Lijuan Wang, Xiaofeng Zhu, Alex Jinpeng Wang.
36
+
37
  ## About
38
+ FlowInOne is a framework that reformulates multimodal generation as a **purely visual flow**, converting all inputs into visual prompts and enabling a clean **image-in, image-out** pipeline governed by a single flow matching model.
39
+
40
  This vision-centric formulation naturally eliminates cross-modal alignment bottlenecks, noise scheduling, and task-specific architectural branches, **unifying text-to-image generation, layout-guided editing, and visual instruction following under one coherent paradigm**.
 
41
 
42
+ ## πŸš€ Setup
43
+
44
+ ```bash
45
+ # Create conda environment
46
+ conda create -n flowinone python=3.10 -y
47
+ conda activate flowinone
48
+
49
+ # Install required packages
50
+ git clone https://github.com/CSU-JPG/FlowInOne.git
51
+ cd FlowInOne/scripts
52
+ sh setup.sh
53
+ ```
54
+
55
+ ## ✨ Usage
56
+
57
+ ### 1. Download Weights
58
+ You can download the model weights and model preparation files using the following commands:
59
  ```bash
60
  # model weights
61
+ wget -O checkpoints/flowinone_256px.pth https://huggingface.co/CSU-JPG/FlowInOne/resolve/main/flowinone_256px.pth
62
+
63
  # model preparation
64
+ wget https://huggingface.co/CSU-JPG/FlowInOne/resolve/main/preparation.tar.gz
65
+ tar -xzvf "preparation.tar.gz"
 
66
  ```
67
+
68
+ ### 2. Inference
69
+ Run inference with the provided script in the repository:
70
  ```bash
71
+ sh scripts/inference.sh
 
 
72
  ```
73
+
74
+ Our training and inference scripts are fully available on [GitHub](https://github.com/CSU-JPG/FlowInOne).
75
 
76
  ## Citation
77
 
78
  If you found our work useful, please consider citing:
79
+ ```bibtex
80
  @article{yi2026flowinoneunifyingmultimodalgenerationimagein,
81
  title={FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching},
82
  author={Junchao Yi and Rui Zhao and Jiahao Tang and Weixian Lei and Linjie Li and Qisheng Su and Zhengyuan Yang and Lijuan Wang and Xiaofeng Zhu and Alex Jinpeng Wang},