Ephemeral182 commited on
Commit
8a71ae4
Β·
verified Β·
1 Parent(s): ed0927d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +321 -44
README.md CHANGED
@@ -3,85 +3,362 @@ license: apache-2.0
3
  library_name: diffusers
4
  ---
5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
 
8
- ## 🎨 PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
9
 
10
- [![demo](https://img.shields.io/badge/%F0%9F%96%8C%EF%B8%8F-Demo-000000?style=flat-square\&logo=vercel)](https://ephemeral182.github.io/PosterCraft/)
11
- [![arxiv](https://img.shields.io/badge/arXiv-Preprint-b31b1b?style=flat-square)](https://arxiv.org/abs/xxxx.xxxxx)
12
- [![github](https://img.shields.io/badge/%F0%9F%92%BB-Code-181717?style=flat-square\&logo=github)](https://github.com/ephemeral182/PosterCraft)
13
- [![video](https://img.shields.io/badge/%F0%9F%8E%A5-Video-ff0000?style=flat-square\&logo=youtube)](https://www.youtube.com/watch?v=XXXXXX)
14
- [![HF Spaces](https://img.shields.io/badge/%F0%9F%A4%96-HuggingFace%20Demo-orange?style=flat-square\&logo=huggingface)](https://huggingface.co/spaces/ephemeral182/postercraft-demo)
 
 
 
 
15
 
16
  ---
17
 
18
- ### ✨ Overview
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- **PosterCraft-v1\_RL** is a high-quality aesthetic poster generation model trained with reinforcement learning on multi-stage synthetic and curated datasets. It takes a structured prompt and produces posters with strong visual composition, artistic balance, and accurate typography rendering.
21
 
22
- This version is inference-only and ideal for **design inspiration**, **AI-driven visual layout generation**, and **creative content production**.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- > πŸ”— **Live Demo**: [ephemeral182.github.io/PosterCraft](https://ephemeral182.github.io/PosterCraft/)
25
- > 🧠 **Model Type**: Custom Diffusion / Poster Generation
26
- > πŸ” **Inference Only** (weights available, training pipeline not included)
 
27
 
28
  ---
29
 
30
- ### πŸ–ΌοΈ Features
 
 
 
 
31
 
32
- * βœ… **Text-to-Poster generation** with multi-element structured layout
33
- * 🎯 **Fine-tuned with RL** using aesthetic feedback and layout score
34
- * ✍️ **Typographic region control** using Gemini-generated masks
35
- * 🧠 **Multi-modal reward design** (text, layout, mask)
36
- * ⚑ Fast inference with Hugging Face Spaces (check demo)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ---
39
 
40
- ### πŸš€ How to Use
41
 
42
- ```python
43
- from diffusers import StableDiffusionPipeline
 
 
 
44
 
45
- pipe = StableDiffusionPipeline.from_pretrained("ephemeral182/PosterCraft-v1_RL").to("cuda")
46
- prompt = "A cinematic poster for 'The Future is Now': futuristic city, neon glow, lone figure, dramatic skyline"
47
- image = pipe(prompt).images[0]
48
- image.save("poster.png")
49
- ```
50
 
51
- Alternatively, try the [Hugging Face Web Demo](https://huggingface.co/spaces/ephemeral182/postercraft-demo) or use the [official site](https://ephemeral182.github.io/PosterCraft/) for richer UI support and mask upload functionality.
 
 
 
 
 
 
 
52
 
53
  ---
54
 
55
- ### πŸ“¦ Resources
56
 
57
- | Resource | Link |
58
- | ---------------- | ---------------------------------------------------------------------------------- |
59
- | πŸ§ͺ Paper (arXiv) | [arxiv.org/abs/xxxx.xxxxx](https://arxiv.org/abs/xxxx.xxxxx) |
60
- | 🧠 Code (GitHub) | [github.com/ephemeral182/PosterCraft](https://github.com/ephemeral182/PosterCraft) |
61
- | 🎬 Video Intro | [YouTube Walkthrough](https://www.youtube.com/watch?v=XXXXXX) |
62
- | 🌐 Website | [PosterCraft WebApp](https://ephemeral182.github.io/PosterCraft/) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
  ---
65
 
66
- ### πŸ“Œ License
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
- This model is released under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
  ---
71
 
72
- ### 🧩 Citation
73
 
74
- ```
75
- @misc{chen2025postercraft,
 
 
76
  title={PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework},
77
- author={Chen, Sixiang and others},
78
- year={2025},
79
- eprint={xxxx.xxxxx},
80
- archivePrefix={arXiv},
81
- primaryClass={cs.CV}
82
  }
83
  ```
84
 
85
  ---
86
 
 
 
 
 
 
87
  ---
 
 
 
 
 
 
 
 
 
 
 
3
  library_name: diffusers
4
  ---
5
 
6
+ <div align="center">
7
+ <h1>🎨 PosterCraft:<br/>Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework</h1>
8
+
9
+ [![arXiv](https://img.shields.io/badge/arXiv-2025.XXXX-red)](https://arxiv.org/abs/XXXX)
10
+ [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/ephemeral182/PosterCraft)
11
+ [![HuggingFace](https://img.shields.io/badge/πŸ€—-HuggingFace-yellow)](https://huggingface.co/PosterCraft)
12
+ [![Website](https://img.shields.io/badge/🌐-Website-green)](https://ephemeral182.github.io/PosterCraft/)
13
+ [![Demo](https://img.shields.io/badge/πŸŽ₯-Live_Demo-purple)](https://ephemeral182.github.io/PosterCraft/)
14
+
15
+ <img src="assets/teaser-1.png" alt="PosterCraft Logo" width="1000"/>
16
+
17
+ ### [**🌐 Website**](https://ephemeral182.github.io/PosterCraft/) | [**🎯 Demo**](https://ephemeral182.github.io/PosterCraft/) | [**πŸ“„ Paper**](https://arxiv.org/abs/XXXX) | [**πŸ€— Models**](https://huggingface.co/PosterCraft) | [**πŸ“š Datasets**](https://huggingface.co/datasets/PosterCraft) | [**πŸŽ₯ Video**](#)
18
+
19
+ </div>
20
+
21
+ ---
22
+
23
+ ## News & Updates
24
+
25
+
26
+ - πŸš€ **[2025.06]** Our live demo and inference code are now available!
27
+ - πŸ“Š **[2025.06]** We have released partial datasets and model weights on HuggingFace.
28
+
29
  ---
30
 
31
+ ## πŸ‘₯ Authors
32
 
33
+ > [**Sixiang Chen**](https://ephemeral182.github.io/)<sup>1</sup>\*, [**Jianyu Lai**](https://openreview.net/profile?id=~Jianyu_Lai1)<sup>1</sup>\*, [**Jialin Gao**](https://scholar.google.com/citations?user=sj4FqEgAAAAJ&hl=zh-CN)<sup>2</sup>\*, [**Tian Ye**](https://owen718.github.io/)<sup>1</sup>, [**Haoyu Chen**](https://haoyuchen.com/)<sup>1</sup>, [**Hengyu Shi**](https://openreview.net/profile?id=%7EHengyu_Shi1)<sup>2</sup>, [**Shitong Shao**](https://shaoshitong.github.io/)<sup>1</sup>, [**Yunlong Lin**](https://scholar.google.com.hk/citations?user=5F3tICwAAAAJ&hl=zh-CN)<sup>3</sup>, [**Song Fei**](https://openreview.net/profile?id=~Song_Fei1)<sup>1</sup>, [**Zhaohu Xing**](https://ge-xing.github.io/)<sup>1</sup>, [**Yeying Jin**](https://jinyeying.github.io/)<sup>4</sup>, **Junfeng Luo**<sup>2</sup>, [**Xiaoming Wei**](https://scholar.google.com/citations?user=JXV5yrZxj5MC&hl=zh-CN)<sup>2</sup>, [**Lei Zhu**](https://sites.google.com/site/indexlzhu/home)<sup>1,5</sup>†
34
+ >
35
+ > <sup>1</sup>The Hong Kong University of Science and Technology (Guangzhou)
36
+ > <sup>2</sup>Meituan
37
+ > <sup>3</sup>Xiamen University
38
+ > <sup>4</sup>National University of Singapore
39
+ > <sup>5</sup>The Hong Kong University of Science and Technology
40
+ >
41
+ > \*Equal Contribution, †Corresponding Author
42
 
43
  ---
44
 
45
+ ## 🌟 What is PosterCraft?
46
+
47
+ <div align="center">
48
+ <img src="images/demo/demo2.png" alt="What is PosterCraft - Quick Prompt Demo" width="1000"/>
49
+ <br>
50
+ </div>
51
+
52
+ PosterCraft is a unified framework for **high-quality aesthetic poster generation** that excels in **precise text rendering**, **seamless integration of abstract art**, **striking layouts**, and **stylistic harmony**.
53
+
54
+
55
+ ## πŸš€ Quick Start
56
+
57
+ ### πŸ”§ Installation
58
+
59
+ ```bash
60
+ # Clone the repository
61
+ git clone https://github.com/ephemeral182/PosterCraft.git
62
+ cd PosterCraft
63
+
64
+ # Create conda environment
65
+ conda create -n postercraft python=3.11
66
+ conda activate postercraft
67
+
68
+ # Install dependencies
69
+ pip install -r requirements.txt
70
+
71
+ # Install PosterCraft
72
+ pip install -e .
73
+ ```
74
+
75
+ ### πŸš€ Quick Generation
76
+
77
+ Generate high-quality aesthetic posters from your prompt with `BF16` precision:
78
+
79
+ ```bash
80
+ python inference.py \
81
+ --prompt "Urban Canvas Street Art Expo poster with bold graffiti-style lettering and dynamic colorful splashes" \
82
+ --enable_recap \
83
+ --num_inference_steps 28 \
84
+ --guidance_scale 3.5 \
85
+ --seed 42 \
86
+ --pipeline_path "black-forest-labs/FLUX.1-dev" \
87
+ --custom_transformer_path "PosterCraft/PosterCraft-v1_RL" \
88
+ --qwen_model_path "Qwen/Qwen3-8B"
89
+ ```
90
+
91
+ ### πŸ’» Gradio Web UI
92
+
93
+ We provide a Gradio web UI for PosterCraft.
94
+
95
+ ```bash
96
+ python -m gradio poster_craft_web_ui.py
97
+ ```
98
+
99
+
100
+ ## πŸ“Š Performance Benchmarks
101
+
102
+ <div align="center">
103
 
104
+ ### πŸ“ˆ Quantitative Results
105
 
106
+ <table>
107
+ <thead>
108
+ <tr>
109
+ <th>Method</th>
110
+ <th>Text Recall ↑</th>
111
+ <th>Text F-score ↑</th>
112
+ <th>Text Accuracy ↑</th>
113
+ </tr>
114
+ </thead>
115
+ <tbody>
116
+ <tr>
117
+ <td style="white-space: nowrap;">OpenCOLE (Open)</td>
118
+ <td>0.082</td>
119
+ <td>0.076</td>
120
+ <td>0.061</td>
121
+ </tr>
122
+ <tr>
123
+ <td style="white-space: nowrap;">Playground-v2.5 (Open)</td>
124
+ <td>0.157</td>
125
+ <td>0.146</td>
126
+ <td>0.132</td>
127
+ </tr>
128
+ <tr>
129
+ <td style="white-space: nowrap;">SD3.5 (Open)</td>
130
+ <td>0.565</td>
131
+ <td>0.542</td>
132
+ <td>0.497</td>
133
+ </tr>
134
+ <tr>
135
+ <td style="white-space: nowrap;">Flux1.dev (Open)</td>
136
+ <td>0.723</td>
137
+ <td>0.707</td>
138
+ <td>0.667</td>
139
+ </tr>
140
+ <tr>
141
+ <td style="white-space: nowrap;">Ideogram-v2 (Close)</td>
142
+ <td>0.711</td>
143
+ <td>0.685</td>
144
+ <td>0.680</td>
145
+ </tr>
146
+ <tr>
147
+ <td style="white-space: nowrap;">BAGEL (Open)</td>
148
+ <td>0.543</td>
149
+ <td>0.536</td>
150
+ <td>0.463</td>
151
+ </tr>
152
+ <tr>
153
+ <td style="white-space: nowrap;">Gemini2.0-Flash-Gen (Close)</td>
154
+ <td>0.798</td>
155
+ <td>0.786</td>
156
+ <td>0.746</td>
157
+ </tr>
158
+ <tr>
159
+ <td style="white-space: nowrap;"><b>PosterCraft (ours)</b></td>
160
+ <td><b>0.787</b></td>
161
+ <td><b>0.774</b></td>
162
+ <td><b>0.735</b></td>
163
+ </tr>
164
+ </tbody>
165
+ </table>
166
 
167
+
168
+ <img src="images/user_study/hpc.png" alt="User Study Results" width="1000"/>
169
+
170
+ </div>
171
 
172
  ---
173
 
174
+ ## 🎭 Gallery & Examples
175
+
176
+ <div align="center">
177
+
178
+ ### 🎨 PosterCraft Gallery
179
 
180
+ <table>
181
+ <tr>
182
+ <td align="center"><img src="images/gallery/gallery_demo1.png" width="250"><br><b>Adventure Travel</b></td>
183
+ <td align="center"><img src="images/gallery/gallery_demo2.png" width="250"><br><b>Post-Apocalyptic</b></td>
184
+ <td align="center"><img src="images/gallery/gallery_demo3.png" width="250"><br><b>Sci-Fi Drama</b></td>
185
+ </tr>
186
+ <tr>
187
+ <td align="center"><img src="images/gallery/gallery_demo4.png" width="250"><br><b>Space Thriller</b></td>
188
+ <td align="center"><img src="images/gallery/gallery_demo5.png" width="250"><br><b>Cultural Event</b></td>
189
+ <td align="center"><img src="images/gallery/gallery_demo6.png" width="250"><br><b>Luxury Product</b></td>
190
+ </tr>
191
+ <tr>
192
+ <td align="center"><img src="images/gallery/gallery_demo7.png" width="250"><br><b>Concert Show</b></td>
193
+ <td align="center"><img src="images/gallery/gallery_demo8.png" width="250"><br><b>Children's Book</b></td>
194
+ <td align="center"><img src="images/gallery/gallery_demo9.png" width="250"><br><b>Movie Poster</b></td>
195
+ </tr>
196
+ </table>
197
+
198
+
199
+ </div>
200
 
201
  ---
202
 
203
+ ## πŸ—οΈ Model Architecture
204
 
205
+ <div align="center">
206
+ <img src="images/overview/framework_fig.png" alt="PosterCraft Framework Overview" width="1000"/>
207
+ <br>
208
+ <em><strong>A unified framework for high-quality aesthetic poster generation</strong></em>
209
+ </div>
210
 
211
+ Our unified framework consists of **four critical optimization stages in the training workflow**:
212
+
213
+ ### πŸ”€ Stage 1: Text Rendering Optimization
214
+ Addresses accurate text generation by precisely rendering diverse text on high-quality backgrounds, also ensuring faithful background representation and establishing foundational fidelity and robustness for poster generation.
 
215
 
216
+ ### 🎨 Stage 2: High-quality Poster Fine-tuning
217
+ Shifts focus to overall poster style and text-background harmony using Region-aware Calibration. This fine-tuning stage preserves text accuracy while strengthening the artistic integrity of the aesthetic poster.
218
+
219
+ ### 🎯 Stage 3: Aesthetic-Text RL
220
+ Employs Aesthetic-Text Preference Optimization to capture higher-order aesthetic trade-offs. This reinforcement learning stage prioritizes outputs that satisfy holistic aesthetic criteria and mitigates defects in font rendering.
221
+
222
+ ### πŸ”„ Stage 4: Vision-Language Feedback
223
+ Introduces a Joint Vision-Language Conditioning mechanism. This iterative feedback combines visual information with targeted text suggestions for multi-modal corrections, progressively refining aesthetic content and background harmony.
224
 
225
  ---
226
 
227
+ ## πŸ’Ύ Model Zoo
228
 
229
+ We provide the weights for our core models, fine-tuned at different stages of the PosterCraft pipeline.
230
+
231
+ <div align="center">
232
+ <table>
233
+ <tr>
234
+ <th>Model</th>
235
+ <th>Stage</th>
236
+ <th>Description</th>
237
+ <th>Download</th>
238
+ </tr>
239
+ <tr>
240
+ <td>🎯 <b>PosterCraft-v1_RL</b></td>
241
+ <td>Stage 3: Aesthetic-Text RL</td>
242
+ <td>Optimized via Aesthetic-Text Preference Optimization for higher-order aesthetic trade-offs.</td>
243
+ <td><a href="https://huggingface.co/PosterCraft/PosterCraft-v1_RL">πŸ€— HF</a></td>
244
+ </tr>
245
+ <tr>
246
+ <td>πŸ”„ <b>PosterCraft-v1_Reflect</b></td>
247
+ <td>Stage 4: Vision-Language Feedback</td>
248
+ <td>Iteratively refined using vision-language feedback for further harmony and content accuracy.</td>
249
+ <td><a href="https://huggingface.co/PosterCraft/PosterCraft-v1_Reflect">πŸ€— HF</a></td>
250
+ </tr>
251
+ </table>
252
+ </div>
253
 
254
  ---
255
 
256
+ ## πŸ“š Datasets
257
+
258
+
259
+ We provide **four specialized datasets** for training PosterCraft workflow:
260
+
261
+ ### πŸ”€ Text-Render-2M
262
+ <div align="center">
263
+ <img src="images/dataset/dataset1.png" alt="Text-Render-2M Dataset" width="1000"/>
264
+ <br>
265
+ <em><strong>Text-Render-2M: Multi-instance text rendering with diverse selections</strong></em>
266
+ </div>
267
+
268
+ A comprehensive text rendering dataset containing **2 million high-quality examples**. Features multi-instance text rendering, diverse text selections (varying in size, count, placement, and rotation), and dynamic content generation through both template-based and random string approaches.
269
 
270
+ ### 🎨 HQ-Poster-100K
271
+ <div align="center">
272
+ <img src="images/dataset/dataset2.png" alt="HQ-Poster-100K Dataset" width="1000"/>
273
+ <br>
274
+ <em><strong>HQ-Poster-100K: Curated high-quality aesthetic posters</strong></em>
275
+ </div>
276
+
277
+ **100,000** meticulously curated high-quality posters with advanced filtering techniques and multi-modal scoring. Features Gemini-powered mask generation with detailed captions for comprehensive poster understanding.
278
+
279
+ ### πŸ‘ Poster-Preference-100K
280
+ <div align="center">
281
+ <img src="images/dataset/dataset3.png" alt="Poster-Preference-100K Dataset" width="1000"/>
282
+ <br>
283
+ <em><strong>Poster-Preference-100K: Preference learning pairs for aesthetic optimization</strong></em>
284
+ </div>
285
+
286
+ **100,000** preference learning poster pairs with comprehensive evaluation by Gemini and aesthetic evaluators. Designed for human-aligned poster generation training through reinforcement learning.
287
+
288
+ ### πŸ”„ Poster-Reflect-120K
289
+ <div align="center">
290
+ <img src="images/dataset/dataset4.png" alt="Poster-Reflect-120K Dataset" width="1000"/>
291
+ <br>
292
+ <em><strong>Poster-Reflect-120K: Vision-language feedback pairs for iterative refinement</strong></em>
293
+ </div>
294
+
295
+ **120,000** vision-language feedback pairs with comprehensive evaluation by Gemini and aesthetic evaluators. This dataset captures the iterative refinement process, and then provides detailed feedback for further improvements.
296
+
297
+ <div align="center">
298
+ <table>
299
+ <tr>
300
+ <th>Dataset</th>
301
+ <th>Size</th>
302
+ <th>Description</th>
303
+ <th>Download</th>
304
+ </tr>
305
+ <tr>
306
+ <td>πŸ”€ <b>Text-Render-2M</b></td>
307
+ <td>2M samples</td>
308
+ <td>High-quality text rendering examples with multi-instance support</td>
309
+ <td><a href="https://huggingface.co/datasets/PosterCraft/Text-Render-2M">πŸ€— HF</a></td>
310
+ </tr>
311
+ <tr>
312
+ <td>🎨 <b>HQ-Poster-100K</b></td>
313
+ <td>100K samples</td>
314
+ <td>Curated high-quality posters with aesthetic evaluation</td>
315
+ <td><a href="https://huggingface.co/datasets/PosterCraft/HQ-Poster-100K">πŸ€— HF</a></td>
316
+ </tr>
317
+ <tr>
318
+ <td>πŸ‘ <b>Poster-Preference-100K</b></td>
319
+ <td>100K pairs</td>
320
+ <td>Preference learning poster pairs for RL training</td>
321
+ <td><a href="https://huggingface.co/datasets/PosterCraft/Poster-Preference-100K">πŸ€— HF</a></td>
322
+ </tr>
323
+ <tr>
324
+ <td>πŸ”„ <b>Poster-Reflect-120K</b></td>
325
+ <td>120K pairs</td>
326
+ <td>Vision-language feedback pairs for iterative refinement</td>
327
+ <td><a href="https://huggingface.co/datasets/PosterCraft/Poster-Reflect-120K">πŸ€— HF</a></td>
328
+ </tr>
329
+ </table>
330
+ </div>
331
 
332
  ---
333
 
334
+ ## πŸ“ Citation
335
 
336
+ If you find PosterCraft useful for your research, please cite our paper:
337
+
338
+ ```bibtex
339
+ @article{chen2024postercraft,
340
  title={PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework},
341
+ author={Chen, Sixiang and Lai, Jianyu and Gao, Jialin and Ye, Tian and Chen, Haoyu and Shi, Hengyu and Shao, Shitong and Lin, Yunlong and Fei, Song and Xing, Zhaohu and Jin, Yeying and Luo, Junfeng and Wei, Xiaoming and Zhu, Lei},
342
+ journal={arXiv preprint arXiv:XXXX.XXXXX},
343
+ year={2024}
 
 
344
  }
345
  ```
346
 
347
  ---
348
 
349
+ ## πŸ™ Acknowledgments
350
+
351
+ - πŸ›οΈ Thanks to our affiliated institutions for their support.
352
+ - 🀝 Special thanks to the open-source community for inspiration.
353
+
354
  ---
355
+
356
+ ## πŸ“¬ Contact
357
+
358
+ For any questions or inquiries, please reach out to us:
359
+
360
+ - **Sixiang Chen**: `schen691@connect.hkust-gz.edu.cn`
361
+ - **Jianyu Lai**: `jlai218@connect.hkust-gz.edu.cn`
362
+
363
+
364
+ </div>