metadata
pipeline_tag: text-to-image
library_name: diffusers
GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering
Paper | Project Page | GitHub
GlyphPrinter is a preference-based text rendering framework designed to eliminate the reliance on explicit reward models for visual text generation. It addresses common failure cases in existing text-to-image models, such as stroke distortions and incorrect glyphs, especially when rendering complex Chinese characters, multilingual text, or out-of-domain symbols.
Key Features
- R-GDPO (Region-Grouped Direct Preference Optimization): A region-based objective that optimizes inter- and intra-sample preferences over annotated regions, substantially enhancing glyph accuracy.
- GlyphCorrector Dataset: A specialized dataset with region-level glyph preference annotations.
- Regional Reward Guidance (RRG): An inference strategy that samples from an optimal distribution with controllable glyph accuracy.
Usage
To use this model, please follow the installation instructions in the official GitHub repository.
CLI Inference
You can run inference using the provided inference.py script:
# list available saved conditions
python3 inference.py --list-conditions
# run inference using a prompt
python3 inference.py \
--prompt "The colorful graffiti font <sks1> printed on the street wall" \
--save-mask
# run inference using a specific condition file
python3 inference.py \
--condition condition_1.npz \
--output-dir outputs_inference
Gradio Demo
Alternatively, you can run the interactive Gradio app:
python app.py
Citation
@inproceedings{GlyphPrinter,
title={{GlyphPrinter}: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering},
author={Shuai, Xincheng and Li, Ziye and Ding, Henghui and Tao, Dacheng},
booktitle={CVPR},
year={2026}
}