TribeRinb commited on
Commit
56330e7
Β·
verified Β·
1 Parent(s): da9fec0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +157 -0
README.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <h1 align="center">GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering</h1>
2
+
3
+ <div align="center">
4
+ <a href=''><img src='https://img.shields.io/badge/arXiv-2603.02138-b31b1b.svg'></a> &nbsp;&nbsp;&nbsp;&nbsp;
5
+ <a href='https://henghuiding.com/GlyphPrinter'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;&nbsp;&nbsp;&nbsp;
6
+ <a href="https://huggingface.co/FudanCVL/GlyphPrinter"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Weights-HF-orange"></a> &nbsp;&nbsp;&nbsp;&nbsp;
7
+ <a href="https://huggingface.co/datasets/FudanCVL/GlyphCorrector"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-HF-orange"></a> &nbsp;&nbsp;&nbsp;&nbsp;
8
+ <!--
9
+ <a href=""><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Bench-HF-orange"></a> &nbsp;&nbsp;&nbsp;&nbsp;
10
+ <a href=""><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Demo%20-HF-orange"></a> &nbsp;&nbsp;&nbsp;&nbsp; -->
11
+ </div>
12
+ <p align="center"><b>Xincheng Shuai<sup>1,*</sup>, Ziye Li<sup>1,*</sup>, Henghui Ding<sup>1,βœ‰</sup>, Dacheng Tao<sup>2</sup></b></p>
13
+ <p align="center">* Equal Contribution, βœ‰ Corresponding Author</p>
14
+ <p align="center"><sup>1</sup>Fudan University, <sup>2</sup>Nanyang Technological University</p>
15
+
16
+
17
+ ## πŸ”₯πŸ”₯πŸ”₯ News
18
+
19
+ - [2026/03/15] Release the **training code** and **GlyphCorrector dataset**. πŸ€— [GlyphCorrector](https://huggingface.co/datasets/FudanCVL/GlyphCorrector).
20
+ - [2026/03/13] Release the **inference code** and **model weights**. πŸ€— [Model Weight](https://huggingface.co/FudanCVL/GlyphPrinter).
21
+ - [2026/02/21] GlyphPrinter is accepted to **CVPR 2026**. πŸ‘πŸ‘
22
+
23
+ ---
24
+
25
+ ## 😊 Introduction
26
+
27
+ ![teaser](assets/teaser.png)
28
+
29
+ **GlyphPrinter** is a preference-based text rendering framework designed to eliminate the reliance on explicit reward models for visual text generation. It addresses the common failure cases in existing T2I models, such as stroke distortions and incorrect glyphs, especially when rendering complex Chinese characters, multilingual text, or out-of-domain symbols.
30
+
31
+
32
+ ---
33
+ ## πŸ”§ Key Features
34
+
35
+ - **GlyphCorrector Dataset:** A specialized dataset with region-level glyph preference annotations, facilitating the model's ability to learn localized glyph correctness.
36
+ - **R-GDPO (Region-Grouped Direct Preference Optimization):** Unlike standard DPO which models global image-level preferences, R-GDPO focuses on local regions where glyph errors typically occur. It optimizes inter- and intra-sample preferences over annotated regions to significantly enhance glyph accuracy.
37
+ - **Regional Reward Guidance (RRG):** A novel inference strategy that samples from an optimal distribution with controllable glyph accuracy.
38
+
39
+ ---
40
+
41
+ ## πŸ‘· Pipeline
42
+
43
+ ![pipeline](assets/pipeline.png)
44
+
45
+ The training of GlyphPrinter consists of two stages:
46
+ 1. **Stage 1 (Fine-Tuning):** The model is first fine-tuned on multilingual synthetic and realistic text images to establish a strong baseline for text rendering.
47
+ 2. **Stage 2 (Region-Level Preference Optimization):** The model is optimized using the R-GDPO objective on the GlyphCorrector dataset. This stage aligns model outputs with accurate glyph regions while discouraging incorrect ones, resulting in superior glyph fidelity.
48
+
49
+
50
+ ---
51
+ ## πŸ’» Quick Start
52
+
53
+ ### Configuration
54
+
55
+ 1. Environment setup
56
+
57
+ ```bash
58
+ cd GlyphPrinter
59
+ conda create -n GlyphPrinter python=3.11.10 -y
60
+ conda activate GlyphPrinter
61
+ ```
62
+
63
+ 2. Requirements installation
64
+
65
+ ```bash
66
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
67
+ pip install --upgrade -r requirements.txt
68
+ ```
69
+
70
+ 3. Inference
71
+
72
+ ```bash
73
+ python app.py
74
+ ```
75
+
76
+ Default server port: `7897`.
77
+
78
+ 4. CLI inference without Gradio (directly load conditions from `saved_conditions` directory, you can manually construct the npz-format condition through app.py)
79
+
80
+ ```bash
81
+ # list available saved conditions
82
+ python3 inference.py --list-conditions
83
+
84
+ # run inference using the latest condition in saved_conditions/
85
+ python3 inference.py \
86
+ --prompt "The colorful graffiti font <sks1> printed on the street wall" \
87
+ --save-mask
88
+
89
+ # run inference using a specific condition file
90
+ python3 inference.py \
91
+ --condition condition_1.npz \
92
+ --output-dir outputs_inference
93
+ ```
94
+
95
+
96
+ ## πŸƒ R-GDPO Training
97
+
98
+
99
+ ### 1. Prepare GlyphCorrector dataset
100
+
101
+ Please first download our regional preference dataset [GlyphCorrector](https://huggingface.co/datasets/FudanCVL/GlyphCorrector). Then, place it under `dataset/GlyphCorrector`:
102
+
103
+ ```bash
104
+ mkdir -p dataset
105
+ huggingface-cli download FudanCVL/GlyphCorrector GlyphCorrector.zip \
106
+ --repo-type dataset \
107
+ --local-dir dataset \
108
+ --local-dir-use-symlinks False
109
+ unzip -q dataset/GlyphCorrector.zip -d dataset
110
+ ```
111
+
112
+ After extraction, verify the folder structure:
113
+ ```text
114
+ dataset/GlyphCorrector/
115
+ β”œβ”€β”€ annotated_mask/
116
+ β”‚ β”œβ”€β”€ batch_0/
117
+ β”‚ β”‚ β”œβ”€β”€ generated_0_mask.jpg
118
+ β”‚ β”‚ └── ...
119
+ β”‚ └── batch_1/
120
+ └── inference_results/
121
+ β”œβ”€β”€ batch_0/
122
+ β”‚ β”œβ”€β”€ generated_0.png
123
+ β”‚ β”œβ”€β”€ glyph_0.png
124
+ β”‚ β”œβ”€β”€ mask_0.png
125
+ β”‚ β”œβ”€β”€ prompt.txt
126
+ β”‚ └── ...
127
+ └── batch_1/
128
+ ```
129
+
130
+
131
+
132
+ ### 2. Run R-GDPO training
133
+
134
+ Use the provided script for R-GDPO training:
135
+
136
+ ```bash
137
+ bash dpo/train_dpo_group.bash
138
+ ```
139
+
140
+
141
+ ## βš™οΈ Default Model Settings
142
+
143
+ - Base FLUX model: `black-forest-labs/FLUX.1-dev`
144
+ - Stage1 Transformer path: `pretrained/pretrained_stage1_attn_mask_transformer-stage-1-2`
145
+ - Stage2 LoRA path: `pretrained/dpo-checkpoint`
146
+
147
+
148
+ ---
149
+ ## πŸ’— Citation
150
+ ```bibtex
151
+ @article{shuai2026glyphprinter,
152
+ title={GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering},
153
+ author={Xincheng Shuai and Ziye Li and Henghui Ding and Dacheng Tao},
154
+ journal={CVPR},
155
+ year={2026}
156
+ }
157
+ ```