| | --- |
| | base_model: |
| | - black-forest-labs/FLUX.1-Fill-dev |
| | pipeline_tag: text-to-image |
| | library_name: transformers |
| | tags: |
| | - art |
| | --- |
| | |
| | # Calligrapher: Freestyle Text Image Customization |
| |
|
| | <div align="center"> |
| | <img src="./assets/teaser.jpg" width="850px" alt="Calligrapher Teaser"> |
| | </div> |
| |
|
| | <div align="center"> |
| | <h3>π <a href="https://calligrapher2025.github.io/Calligrapher/">Project Page</a> | π¦ <a href="https://github.com/Calligrapher2025/Calligrapher">Code</a> | π₯ <a href="https://youtu.be/FLSPphkylQE">Video</a></h3> |
| | </div> |
| |
|
| | ## π― Overview |
| |
|
| | **Calligrapher** is a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Our framework supports text customization under various settings including self-reference, cross-reference, and non-text reference customization. |
| |
|
| | ## β¨ Key Features |
| |
|
| | - **π¨ Freestyle Text Customization**: Generate text with diverse stylized images and text prompts |
| | - **π Various Reference Modes**: Support for self-reference, cross-reference, and non-text reference customization |
| | - **π High-Quality Results**: Photorealistic text image customization with consistent typography |
| |
|
| | ## π¦ Repository Contents |
| |
|
| | This Hugging Face repository contains: |
| |
|
| | - **`calligrapher.bin`**: Pre-trained Calligrapher model weights. |
| | - **`Calligrapher_bench_testing.zip`**: Comprehensive test dataset with examples for both self-reference and cross-reference customization scenarios with additional reference images for testing, omitting a small portion of samples due to IP concerns. |
| |
|
| |
|
| |
|
| | ## π οΈ Quick Start |
| |
|
| | ### Installation |
| |
|
| | We provide two ways to set up the environment (requiring Python 3.10 + PyTorch 2.5.0 + CUDA): |
| |
|
| | #### Using pip |
| | ```bash |
| | # Clone the repository |
| | git clone https://github.com/Calligrapher2025/Calligrapher.git |
| | cd Calligrapher |
| | |
| | # Install dependencies |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | #### Using Conda |
| | ```bash |
| | # Clone the repository |
| | git clone https://github.com/Calligrapher2025/Calligrapher.git |
| | cd Calligrapher |
| | |
| | # Create and activate conda environment |
| | conda env create -f env.yml |
| | conda activate calligrapher |
| | ``` |
| |
|
| | ### Download Models & Testing Data |
| |
|
| | ```python |
| | from huggingface_hub import snapshot_download |
| | |
| | # Download Calligrapher model and test data |
| | snapshot_download("Calligrapher2025/Calligrapher") |
| | # Download required base models (granted access needed for FLUX.1-Fill) |
| | snapshot_download("black-forest-labs/FLUX.1-Fill-dev", token="your_token") |
| | snapshot_download("google/siglip-so400m-patch14-384") |
| | ``` |
| |
|
| | ### Configuration |
| |
|
| | Before running the models, you need to configure the paths in `path_dict.json`: |
| |
|
| | ```json |
| | { |
| | "data_dir": "path/to/Calligrapher_bench_testing", |
| | "cli_save_dir": "path/to/cli_results", |
| | "gradio_save_dir": "path/to/gradio_results", |
| | "gradio_temp_dir": "path/to/gradio_tmp", |
| | "base_model_path": "path/to/FLUX.1-Fill-dev", |
| | "image_encoder_path": "path/to/siglip-so400m-patch14-384", |
| | "calligrapher_path": "path/to/calligrapher.bin" |
| | } |
| | ``` |
| |
|
| | Configuration parameters: |
| | - `data_dir`: Path to store the test dataset |
| | - `cli_save_dir`: Path to save results from command-line interface experiments |
| | - `gradio_save_dir`: Path to save results from Gradio interface experiments |
| | - `gradio_temp_dir`: Path to save Gradio temporary files |
| | - `base_model_path`: Path to the base model FLUX.1-Fill-dev |
| | - `image_encoder_path`: Path to the SigLIP image encoder model |
| | - `calligrapher_path`: Path to the Calligrapher model weights |
| |
|
| | ### Run Gradio Demo |
| |
|
| | ```bash |
| | # Basic Gradio demo |
| | python gradio_demo.py |
| | |
| | # PLEASE consider trying examples here first - demo with custom mask upload (recommended for first-time users) |
| | # This version includes pre-configured examples and is RECOMMENDED for users to first understand how to use the model |
| | python gradio_demo_upload_mask.py |
| | ``` |
| |
|
| | Below is a preview of the Gradio demo interfaces: |
| |
|
| | <div align="center"> |
| | <img src="./assets/gradio_preview.png" width="900px" alt="Gradio Demo Preview"> |
| | </div> |
| |
|
| | **β¨User Tips:** |
| |
|
| | 1. **Speed vs Quality Trade-off.** Use fewer steps (e.g., 10-step which takes ~4s/image on a single A6000 GPU) for faster generation, but quality may be lower. |
| |
|
| | 2. **Inpaint Position Freedom.** Inpainting positions are flexible - they don't necessarily need to match the original text locations in the input image. |
| |
|
| | 3. **Iterative Editing.** Drag outputs from the gallery to the Image Editing Panel (clean the Editing Panel first) for quick refinements. |
| |
|
| | 4. **Mask Optimization.** Adjust mask size/aspect ratio to match your desired content. The model tends to fill the masks, and harmonizes the generation with background in terms of color and lighting. |
| |
|
| | 5. **Reference Image Tip.** White-background references improve style consistency - the encoder also considers background context of the given reference image. |
| |
|
| | 6. **Resolution Balance.** Very high-resolution generation sometimes triggers spelling errors. 512/768px are recommended considering the model is trained under the resolution of 512. |
| |
|
| | ## π¨ Command Line Usage Examples |
| |
|
| | ### Self-reference Customization |
| | ```bash |
| | python infer_calligrapher_self_custom.py |
| | ``` |
| |
|
| | ### Cross-reference Customization |
| | ```bash |
| | python infer_calligrapher_cross_custom.py |
| | ``` |
| |
|
| | **Note:** Image result files starting with "result" are the customization outputs, while files starting with "vis_result" are concatenated results showing the source image, reference image, and model output together. |
| | |
| | ## π Framework |
| | |
| | <div align="center"> |
| | <img src="./assets/framework.jpg" width="900px" alt="Calligrapher Framework"> |
| | </div> |
| | |
| | Our framework integrates localized style injection and diffusion-based learning, featuring: |
| | - **Self-distillation mechanism** for automatic typography benchmark construction. |
| | - **Localized style injection** via trainable style encoder. |
| | - **In-context generation** for enhanced style alignment. |
| | |
| | ## π Results Gallery |
| | |
| | <div align="center"> |
| | <img src="./assets/application.jpg" width="900px" alt="Calligrapher Applications"> |
| | </div> |