Calligrapher / README.md

Calligrapher2025

Update README.md

ec75f2d verified 5 months ago

6.95 kB

	---
	base_model:
	- black-forest-labs/FLUX.1-Fill-dev
	pipeline_tag: text-to-image
	library_name: transformers
	tags:
	- art
	---

	# Calligrapher: Freestyle Text Image Customization

	<div align="center">
	<img src="./assets/teaser.jpg" width="850px" alt="Calligrapher Teaser">
	</div>

	<div align="center">
	<h3>📄 <a href="https://ezioby.github.io/Calligrapher/">Project Page</a> \| 📦 <a href="https://github.com/Calligrapher2025/Calligrapher">Code</a> \| 🎥 <a href="https://youtu.be/FLSPphkylQE">Video</a> \| 🤗 <a href="https://huggingface.co/spaces/Calligrapher2025/Calligrapher">HF_Demo</a> </h3>
	</div>

	## 🎯 Overview

	Calligrapher is a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Our framework supports text customization under various settings including self-reference, cross-reference, and non-text reference customization.

	## ✨ Key Features

	- 🎨 Freestyle Text Customization: Generate text with diverse stylized images and text prompts
	- 🔄 Various Reference Modes: Support for self-reference, cross-reference, and non-text reference customization
	- 🚀 High-Quality Results: Photorealistic text image customization with consistent typography
	- 🌐 Multi-Language Support: Style-centric text customization across diverse languages (see <a href="https://github.com/Calligrapher2025/Calligrapher/issues/1">this issue</a>)
	<div align="center">
	<img src="./assets/multilingual_samples.png" width="900px" alt="Multilingual Samples">
	</div>
	## 📦 Repository Contents

	This Hugging Face repository contains:

	- `calligrapher.bin`: Pre-trained Calligrapher model weights.
	- `Calligrapher_bench_testing.zip`: Comprehensive test dataset with examples for both self-reference and cross-reference customization scenarios with additional reference images for testing, omitting a small portion of samples due to IP concerns.



	## 🛠️ Quick Start

	### Installation

	We provide two ways to set up the environment (requiring Python 3.10 + PyTorch 2.5.0 + CUDA):

	#### Using pip
	```bash
	# Clone the repository
	git clone https://github.com/Calligrapher2025/Calligrapher.git
	cd Calligrapher

	# Install dependencies
	pip install -r requirements.txt
	```

	#### Using Conda
	```bash
	# Clone the repository
	git clone https://github.com/Calligrapher2025/Calligrapher.git
	cd Calligrapher

	# Create and activate conda environment
	conda env create -f env.yml
	conda activate calligrapher
	```

	### Download Models & Testing Data

	```python
	from huggingface_hub import snapshot_download

	# Download Calligrapher model and test data
	snapshot_download("Calligrapher2025/Calligrapher")
	# Download required base models (granted access needed for FLUX.1-Fill)
	snapshot_download("black-forest-labs/FLUX.1-Fill-dev", token="your_token")
	snapshot_download("google/siglip-so400m-patch14-384")
	```

	### Configuration

	Before running the models, you need to configure the paths in `path_dict.json`:

	```json
	{
	"data_dir": "path/to/Calligrapher_bench_testing",
	"cli_save_dir": "path/to/cli_results",
	"gradio_save_dir": "path/to/gradio_results",
	"gradio_temp_dir": "path/to/gradio_tmp",
	"base_model_path": "path/to/FLUX.1-Fill-dev",
	"image_encoder_path": "path/to/siglip-so400m-patch14-384",
	"calligrapher_path": "path/to/calligrapher.bin"
	}
	```

	Configuration parameters:
	- `data_dir`: Path to store the test dataset
	- `cli_save_dir`: Path to save results from command-line interface experiments
	- `gradio_save_dir`: Path to save results from Gradio interface experiments
	- `gradio_temp_dir`: Path to save Gradio temporary files
	- `base_model_path`: Path to the base model FLUX.1-Fill-dev
	- `image_encoder_path`: Path to the SigLIP image encoder model
	- `calligrapher_path`: Path to the Calligrapher model weights

	### Run Gradio Demo

	```bash
	# Basic Gradio demo
	python gradio_demo.py

	# PLEASE consider trying examples here first - demo with custom mask upload (recommended for first-time users)
	# This version includes pre-configured examples and is RECOMMENDED for users to first understand how to use the model
	python gradio_demo_upload_mask.py

	```

	Below is a preview of the Gradio demo interfaces:

	<div align="center">
	<img src="./assets/gradio_preview.png" width="900px" alt="Gradio Demo Preview">
	</div>

	We also provide a gradio demo enabling multilingual freestyle text customization such as Chinese, which is supported by [TextFLUX](https://github.com/yyyyyxie/textflux). To use this gradio demo, first download [TextFLUX weights](https://huggingface.co/yyyyyxie/textflux-lora/blob/main/pytorch_lora_weights.safetensors) and configure the "textflux_path" entry in "path_dict.json". Then download [the font resource](https://github.com/yyyyyxie/textflux/blob/main/resource/font/Arial-Unicode-Regular.ttf) to "./resources/" and run:
	```bash
	python gradio_demo_multilingual.py
	```

	✨User Tips:

	1. Speed vs Quality Trade-off. Use fewer steps (e.g., 10-step which takes ~4s/image on a single A6000 GPU) for faster generation, but quality may be lower.

	2. Inpaint Position Freedom. Inpainting positions are flexible - they don't necessarily need to match the original text locations in the input image.

	3. Iterative Editing. Drag outputs from the gallery to the Image Editing Panel (clean the Editing Panel first) for quick refinements.

	4. Mask Optimization. Adjust mask size/aspect ratio to match your desired content. The model tends to fill the masks, and harmonizes the generation with background in terms of color and lighting.

	5. Reference Image Tip. White-background references improve style consistency - the encoder also considers background context of the given reference image.

	6. Resolution Balance. Very high-resolution generation sometimes triggers spelling errors. 512/768px are recommended considering the model is trained under the resolution of 512.

	## 🎨 Command Line Usage Examples

	### Self-reference Customization
	```bash
	python infer_calligrapher_self_custom.py
	```

	### Cross-reference Customization
	```bash
	python infer_calligrapher_cross_custom.py
	```

	Note: Image result files starting with "result" are the customization outputs, while files starting with "vis_result" are concatenated results showing the source image, reference image, and model output together.

	## 📊 Framework

	<div align="center">
	<img src="./assets/framework.jpg" width="900px" alt="Calligrapher Framework">
	</div>

	Our framework integrates localized style injection and diffusion-based learning, featuring:
	- Self-distillation mechanism for automatic typography benchmark construction.
	- Localized style injection via trainable style encoder.
	- In-context generation for enhanced style alignment.

	## 🎭 Results Gallery

	<div align="center">
	<img src="./assets/application.jpg" width="900px" alt="Calligrapher Applications">
	</div>