generation
CheeseStar commited on
Commit
a5d3e05
·
verified ·
1 Parent(s): 0bebd9b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -247
README.md CHANGED
@@ -18,256 +18,10 @@ _**[Harold Haodong Chen](https://haroldchen19.github.io/)<sup>1,2*</sup>, [Xinxi
18
  <h5 align="center"> If you like our project, please give us a star ⭐ on huggingface for latest update. </h2>
19
 
20
  <a href='https://arxiv.org/abs/2602.02227'><img src='https://img.shields.io/badge/arXiv-2602.02227-b31b1b.svg'></a>
 
21
  <br>
22
 
23
  </div>
24
 
25
  ![latentmorph](https://cdn-uploads.huggingface.co/production/uploads/69748068683fa304ce0f7368/HuNmWtBO2-eu8kkMHDONj.png)
26
 
27
-
28
- <!-- <table class="center">
29
- <tr>
30
- <td><img src="assets/latentmorph.png"></td>
31
- </tr>
32
- </table> -->
33
-
34
-
35
-
36
- <!-- ## 🧰 TODO
37
-
38
- - [x] Release training code.
39
- - [x] Release inference code.
40
- - [ ] Release Paper.
41
- - [ ] Release model weights.
42
-
43
- --- -->
44
-
45
-
46
-
47
- <a name="installation"></a>
48
-
49
- ## 🚀 Installation
50
-
51
- ### 1. Clone this repository and navigate to source folder
52
-
53
- ```bash
54
- cd LatentMorph
55
- ```
56
-
57
- ### 2. Build Environment
58
-
59
- This repo ships `environment.yml`.
60
-
61
- ```bash
62
- conda env create -f environment.yml
63
- conda activate ./envs/latentmorph
64
- ```
65
-
66
- If you don't use conda, make sure you can run:
67
-
68
- ```bash
69
- python -c "import torch; import transformers; print(torch.__version__)"
70
- ```
71
-
72
- ---
73
-
74
-
75
-
76
- <a name="data&model"></a>
77
-
78
- ## 🌏 Data & Model
79
-
80
- This repo does not ship training datasets under `data/`. Please download them locally via Hugging Face.
81
-
82
- ### 1. Create the local data layout
83
-
84
- ```bash
85
- mkdir -p data/.cache/huggingface data/.cache/torch data/hps_ckpt outputs_sft/checkpoints_control outputs/rl_result
86
- ```
87
-
88
- ### 2. Download model weights into the local cache
89
-
90
- We store Hugging Face cache inside the repo:
91
-
92
- ```bash
93
- export HF_HOME="$(pwd)/data/.cache/huggingface"
94
- export TORCH_HOME="$(pwd)/data/.cache/torch"
95
- python -m pip install huggingface_hub
96
- ```
97
-
98
- Download Janus and CLIP:
99
-
100
- ```bash
101
- python -m huggingface_hub.cli download deepseek-ai/Janus-Pro-7B --local-dir "$HF_HOME"
102
- python -m huggingface_hub.cli download openai/clip-vit-large-patch14 --local-dir "$HF_HOME"
103
- ```
104
-
105
- Download HPS v2.1 reward weights:
106
-
107
- ```bash
108
- bash scripts/download_required_assets.sh
109
- python -m pip install "git+https://github.com/tgxs002/HPSv2.git"
110
- ```
111
-
112
- ### 3. Datasets / prompts (download from Hugging Face)
113
-
114
- We expect the following local layout:
115
-
116
- - **SFT dataset**: `data/midjourney-prompts/data/*.zstd.parquet`
117
- - **RL prompts**: `data/T2I-CompBench/examples/dataset/*.txt`
118
-
119
- Download with Hugging Face (replace the repo ids):
120
-
121
- ```bash
122
- # Midjourney prompts (parquet shards) -> data/midjourney-prompts/data/*.zstd.parquet
123
- huggingface-cli download --repo-type dataset vivym/midjourney-prompts \
124
- --local-dir data/midjourney-prompts --resume-download
125
-
126
- # T2I-CompBench prompts (.txt) -> data/T2I-CompBench/examples/dataset/*.txt
127
- huggingface-cli download --repo-type dataset NinaKarine/t2i-compbench \
128
- --include "examples/dataset/*.txt" \
129
- --local-dir data/T2I-CompBench --resume-download
130
- ```
131
-
132
- Quick sanity checks:
133
-
134
- ```bash
135
- ls -lh data/midjourney-prompts/data | head
136
- ls -lh data/T2I-CompBench/examples/dataset | head
137
- ```
138
-
139
- ---
140
-
141
-
142
-
143
- <a name="inference_Suite"></a>
144
-
145
- ## 📍 Inference Suite
146
-
147
- LatentMorph has two Inference part provided :
148
-
149
- - **SFT Inference Part (`inference_sft`)**
150
-
151
- - **RL Inference Part (`inference_rl`)**
152
-
153
- Before running inference, ensure you have activated the environment:
154
-
155
- ```bash
156
- conda activate latentmorph
157
- ```
158
-
159
- ### 1. Prepare Model Weights
160
-
161
- You can download our pre-trained checkpoints from [Hugging Face](https://huggingface.co/CheeseStar/LatentMorph):
162
-
163
-
164
- | Weight Type | Filename | Download Command |
165
- | -------------------------- | ------------------ | ------------------------------------------------------------ |
166
- | **SFT Controller** | `ckpt_sft.pt` | `huggingface-cli download CheeseStar/LatentMorph sft.pt --local-dir .` |
167
- | **RL Policy** | `ckpt_rl.pt` | `huggingface-cli download CheeseStar/LatentMorph rl.pt --local-dir .` |
168
- | **SFT Controller w/ LoRA** | `ckpt_sft_LoRA.pt` | (User Trained) |
169
- | **RL Policy w/ LoRA** | `ckpt_rl_LoRA.pt` | (User Trained) |
170
-
171
- ---
172
-
173
-
174
- ### 2. Run Inference
175
-
176
- We provide two modes for both **SFT** and **RL** stages. Choose the corresponding script folder (`inference_sft` or `inference_rl`).
177
-
178
- #### **Option A: Single Prompt (Quick Test)**
179
-
180
- Generate an image from a specific text prompt.
181
-
182
- ```bash
183
- # Example for SFT
184
- bash inference_sft/run_infer_one.bash
185
- ```
186
-
187
- > **Customization:** Open `run_infer_one.bash` to modify the `prompt` string and `output` path.
188
- > **Result:** View your image at `inference_[sft/rl]_out/single.png`.
189
-
190
- #### **Option B: Batch Processing (Group of Prompts)**
191
-
192
- Generate multiple images using a `.txt` file (one prompt per line).
193
-
194
- ```bash
195
- # Example for RL
196
- bash inference_rl/run_infer.bash
197
- ```
198
-
199
- > **Setup:** Ensure your `prompts_file` path in the bash script points to your text file.
200
- > **Result:** All generated images will be saved in `inference_[sft/rl]_out/batch/`.
201
-
202
- ---
203
-
204
-
205
- <a name="training_suite"></a>
206
-
207
- ## ▶️ Training Suite
208
-
209
- LatentMorph has two training stages:
210
-
211
- - **SFT (`latent_sft`)**: train lightweight control modules (controller) with teacher-forcing while freezing the large Janus model.
212
- - **RL (`latent_rl`)**: train a trigger policy + condenser with CLIP/HPS rewards (the rest of Janus/control stack stays frozen).
213
-
214
-
215
- ### SFT: train controller (teacher-forcing)
216
-
217
- ```bash
218
- bash sft_train.sh
219
- ```
220
-
221
- > You can control the training depth using the `--lora_control` flag in the training script:
222
- >
223
- > * `--lora_control 0`: Trains **only** the control modules (Backbone remains frozen).
224
- > * `--lora_control 1`: Fine-tunes the **Backbone** and control modules together via LoRA.
225
-
226
-
227
- **Outputs:**
228
-
229
- - `outputs_sft/checkpoints_control/ckpt_latest.pt`
230
- - `outputs_sft/checkpoints_control/ckpt_step_*.pt`
231
-
232
- ### RL: train trigger policy (policy gradient)
233
-
234
- Ensure your SFT checkpoint exists at `outputs_sft/checkpoints_control/ckpt_latest.pt`.
235
-
236
- ```bash
237
- bash rl_train.sh
238
- ```
239
-
240
- **Outputs:**
241
-
242
- - `outputs/rl_result/ckpt_latest.pt`
243
- - `outputs/rl_result/ckpt_step_*.pt`
244
- - `outputs/rl_result/logs/`
245
-
246
- ---
247
-
248
- <a name="citation"></a>
249
-
250
- ## 📝 Citation
251
-
252
- Please consider citing our paper if you find LatentMorph is useful:
253
-
254
- ```bib
255
- @article{chen2026show,
256
- title={Show, Don't Tell: Morphing Latent Reasoning into Image Generation},
257
- author={Chen, Harold Haodong and Yin, Xinxiang and Shu, Wen-Jie and Zhang, Hongfei and Zhang, Zixin and Liao, Chenfei and Guo, Litao and Chen, Qifeng and Chen, Ying-Cong},
258
- journal={arXiv preprint arXiv:2602.02227},
259
- year={2026}
260
- }
261
- ```
262
-
263
- ---
264
-
265
- ## 🍗 Acknowledgement
266
-
267
- Our LatentMorph is developed based on the codebases of [Janus-Pro](https://github.com/deepseek-ai/Janus), [Janus-Pro-R1](https://github.com/wendell0218/Janus-Pro-R1) and [DanceGRPO](https://github.com/XueZeyue/DanceGRPO), and we would like to thank the developers of them.
268
-
269
- ---
270
-
271
- ## 📪 Contact
272
-
273
- For any question, feel free to open an issue or email `haroldchen328@gmail.com`.
 
18
  <h5 align="center"> If you like our project, please give us a star ⭐ on huggingface for latest update. </h2>
19
 
20
  <a href='https://arxiv.org/abs/2602.02227'><img src='https://img.shields.io/badge/arXiv-2602.02227-b31b1b.svg'></a>
21
+ <a href='https://github.com/EnVision-Research/LatentMorph'><img src='https://img.shields.io/badge/GitHub-LatentMorph-181717.svg'></a>
22
  <br>
23
 
24
  </div>
25
 
26
  ![latentmorph](https://cdn-uploads.huggingface.co/production/uploads/69748068683fa304ce0f7368/HuNmWtBO2-eu8kkMHDONj.png)
27