|
|
--- |
|
|
base_model: |
|
|
- black-forest-labs/FLUX.1-Fill-dev |
|
|
pipeline_tag: text-to-image |
|
|
library_name: transformers |
|
|
tags: |
|
|
- art |
|
|
--- |
|
|
|
|
|
# Calligrapher: Freestyle Text Image Customization |
|
|
|
|
|
<div align="center"> |
|
|
<img src="./assets/teaser.jpg" width="850px" alt="Calligrapher Teaser"> |
|
|
</div> |
|
|
|
|
|
<div align="center"> |
|
|
<h3>π <a href="https://ezioby.github.io/Calligrapher/">Project Page</a> | π¦ <a href="https://github.com/Calligrapher2025/Calligrapher">Code</a> | π₯ <a href="https://youtu.be/FLSPphkylQE">Video</a> | π€ <a href="https://huggingface.co/spaces/Calligrapher2025/Calligrapher">HF_Demo</a> </h3> |
|
|
</div> |
|
|
|
|
|
## π― Overview |
|
|
|
|
|
**Calligrapher** is a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Our framework supports text customization under various settings including self-reference, cross-reference, and non-text reference customization. |
|
|
|
|
|
## β¨ Key Features |
|
|
|
|
|
- **π¨ Freestyle Text Customization**: Generate text with diverse stylized images and text prompts |
|
|
- **π Various Reference Modes**: Support for self-reference, cross-reference, and non-text reference customization |
|
|
- **π High-Quality Results**: Photorealistic text image customization with consistent typography |
|
|
- **π Multi-Language Support**: Style-centric text customization across diverse languages (see <a href="https://github.com/Calligrapher2025/Calligrapher/issues/1">this issue</a>) |
|
|
<div align="center"> |
|
|
<img src="./assets/multilingual_samples.png" width="900px" alt="Multilingual Samples"> |
|
|
</div> |
|
|
## π¦ Repository Contents |
|
|
|
|
|
This Hugging Face repository contains: |
|
|
|
|
|
- **`calligrapher.bin`**: Pre-trained Calligrapher model weights. |
|
|
- **`Calligrapher_bench_testing.zip`**: Comprehensive test dataset with examples for both self-reference and cross-reference customization scenarios with additional reference images for testing, omitting a small portion of samples due to IP concerns. |
|
|
|
|
|
|
|
|
|
|
|
## π οΈ Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
We provide two ways to set up the environment (requiring Python 3.10 + PyTorch 2.5.0 + CUDA): |
|
|
|
|
|
#### Using pip |
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone https://github.com/Calligrapher2025/Calligrapher.git |
|
|
cd Calligrapher |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
#### Using Conda |
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone https://github.com/Calligrapher2025/Calligrapher.git |
|
|
cd Calligrapher |
|
|
|
|
|
# Create and activate conda environment |
|
|
conda env create -f env.yml |
|
|
conda activate calligrapher |
|
|
``` |
|
|
|
|
|
### Download Models & Testing Data |
|
|
|
|
|
```python |
|
|
from huggingface_hub import snapshot_download |
|
|
|
|
|
# Download Calligrapher model and test data |
|
|
snapshot_download("Calligrapher2025/Calligrapher") |
|
|
# Download required base models (granted access needed for FLUX.1-Fill) |
|
|
snapshot_download("black-forest-labs/FLUX.1-Fill-dev", token="your_token") |
|
|
snapshot_download("google/siglip-so400m-patch14-384") |
|
|
``` |
|
|
|
|
|
### Configuration |
|
|
|
|
|
Before running the models, you need to configure the paths in `path_dict.json`: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"data_dir": "path/to/Calligrapher_bench_testing", |
|
|
"cli_save_dir": "path/to/cli_results", |
|
|
"gradio_save_dir": "path/to/gradio_results", |
|
|
"gradio_temp_dir": "path/to/gradio_tmp", |
|
|
"base_model_path": "path/to/FLUX.1-Fill-dev", |
|
|
"image_encoder_path": "path/to/siglip-so400m-patch14-384", |
|
|
"calligrapher_path": "path/to/calligrapher.bin" |
|
|
} |
|
|
``` |
|
|
|
|
|
Configuration parameters: |
|
|
- `data_dir`: Path to store the test dataset |
|
|
- `cli_save_dir`: Path to save results from command-line interface experiments |
|
|
- `gradio_save_dir`: Path to save results from Gradio interface experiments |
|
|
- `gradio_temp_dir`: Path to save Gradio temporary files |
|
|
- `base_model_path`: Path to the base model FLUX.1-Fill-dev |
|
|
- `image_encoder_path`: Path to the SigLIP image encoder model |
|
|
- `calligrapher_path`: Path to the Calligrapher model weights |
|
|
|
|
|
### Run Gradio Demo |
|
|
|
|
|
```bash |
|
|
# Basic Gradio demo |
|
|
python gradio_demo.py |
|
|
|
|
|
# PLEASE consider trying examples here first - demo with custom mask upload (recommended for first-time users) |
|
|
# This version includes pre-configured examples and is RECOMMENDED for users to first understand how to use the model |
|
|
python gradio_demo_upload_mask.py |
|
|
|
|
|
``` |
|
|
|
|
|
Below is a preview of the Gradio demo interfaces: |
|
|
|
|
|
<div align="center"> |
|
|
<img src="./assets/gradio_preview.png" width="900px" alt="Gradio Demo Preview"> |
|
|
</div> |
|
|
|
|
|
We also provide a gradio demo enabling multilingual freestyle text customization such as Chinese, which is supported by [TextFLUX](https://github.com/yyyyyxie/textflux). To use this gradio demo, first download [TextFLUX weights](https://huggingface.co/yyyyyxie/textflux-lora/blob/main/pytorch_lora_weights.safetensors) and configure the "textflux_path" entry in "path_dict.json". Then download [the font resource](https://github.com/yyyyyxie/textflux/blob/main/resource/font/Arial-Unicode-Regular.ttf) to "./resources/" and run: |
|
|
```bash |
|
|
python gradio_demo_multilingual.py |
|
|
``` |
|
|
|
|
|
**β¨User Tips:** |
|
|
|
|
|
1. **Speed vs Quality Trade-off.** Use fewer steps (e.g., 10-step which takes ~4s/image on a single A6000 GPU) for faster generation, but quality may be lower. |
|
|
|
|
|
2. **Inpaint Position Freedom.** Inpainting positions are flexible - they don't necessarily need to match the original text locations in the input image. |
|
|
|
|
|
3. **Iterative Editing.** Drag outputs from the gallery to the Image Editing Panel (clean the Editing Panel first) for quick refinements. |
|
|
|
|
|
4. **Mask Optimization.** Adjust mask size/aspect ratio to match your desired content. The model tends to fill the masks, and harmonizes the generation with background in terms of color and lighting. |
|
|
|
|
|
5. **Reference Image Tip.** White-background references improve style consistency - the encoder also considers background context of the given reference image. |
|
|
|
|
|
6. **Resolution Balance.** Very high-resolution generation sometimes triggers spelling errors. 512/768px are recommended considering the model is trained under the resolution of 512. |
|
|
|
|
|
## π¨ Command Line Usage Examples |
|
|
|
|
|
### Self-reference Customization |
|
|
```bash |
|
|
python infer_calligrapher_self_custom.py |
|
|
``` |
|
|
|
|
|
### Cross-reference Customization |
|
|
```bash |
|
|
python infer_calligrapher_cross_custom.py |
|
|
``` |
|
|
|
|
|
**Note:** Image result files starting with "result" are the customization outputs, while files starting with "vis_result" are concatenated results showing the source image, reference image, and model output together. |
|
|
|
|
|
## π Framework |
|
|
|
|
|
<div align="center"> |
|
|
<img src="./assets/framework.jpg" width="900px" alt="Calligrapher Framework"> |
|
|
</div> |
|
|
|
|
|
Our framework integrates localized style injection and diffusion-based learning, featuring: |
|
|
- **Self-distillation mechanism** for automatic typography benchmark construction. |
|
|
- **Localized style injection** via trainable style encoder. |
|
|
- **In-context generation** for enhanced style alignment. |
|
|
|
|
|
## π Results Gallery |
|
|
|
|
|
<div align="center"> |
|
|
<img src="./assets/application.jpg" width="900px" alt="Calligrapher Applications"> |
|
|
</div> |