font-architect / README.md
dzungpham's picture
Update README.md
d4cf9be verified
metadata
license: apache-2.0
datasets:
  - dzungpham/font-diffusion-generated-data
language:
  - en
library_name: diffusers
tags:
  - font-diffusion
  - image-to-image
  - contrastive-learning
  - diffusers
  - font-generation
  - character-synthesis
  - style-transfer
  - dpm-solver

Model Card for FontDiffuser

Model Details

Model Type

  • Architecture: Diffusion-based Font Generation Model
  • Framework: PyTorch + Hugging Face Diffusers
  • Scheduler: DPM-Solver++ (configurable: dpmsolver++ / dpmsolver)
  • Guidance: Classifier-free guidance
  • Base Model: FontDiffuser with Content and Style Encoders

Model Components

  1. UNet: Main diffusion model for image generation
  2. Content Encoder: Extracts character structure information
  3. Style Encoder: Extracts font style features
  4. DDPM/DPM Scheduler: Noise scheduling for diffusion process

Training Configuration

  • Resolution: 96Γ—96 pixels
  • Batch Size: configurable
  • Inference Steps: 20 (default, configurable)
  • Guidance Scale: 7.5 (default, configurable)
  • Precision: FP32/FP16 (optional)
  • Device: CUDA/GPU recommended

Installation

The installation utilize uv package manager for its high speed due to implementation in Rust

uv pip install diffusers torch torchvision safetensors
uv pip install lpips scikit-image pytorch-fid  # Optional: for evaluation

Model usage

  • Load pipeline:
from argparse import Namespace
from inference.sample_optimized import load_fontdiffuser_pipeline

args = Namespace(
    ckpt_dir="ckpt",
    device="cuda:0",
    guidance_scale=7.5,
    num_inference_steps=20,
    fp16=False,
    enable_xformers=False,
)
pipe = load_fontdiffuser_pipeline(args=args)
  • Single-image inference (recommended)
accelerate launch run_inference.py \
  --ckpt_dir ckpt \
  --content_character "A" \
  --style_image_path style_images/foo.png \
  --save_image \
  --save_image_dir results/
  • Large-scale batch with checkpoint/resume
accelerate launch run_inference.py \
  --ckpt_dir ckpt \
  --characters chars.txt \
  --style_images "style_images/*.png" \
  --ttf_path fonts/myfont.ttf \
  --output_dir my_dataset/train_original \
  --batch_size 8 \
  --num_inference_steps 15 \
  --guidance_scale 7.5 \
  --save_interval 10
  • Multi-GPU inference via Accelerate
accelerate launch run_inference.py \
  --ckpt_dir ckpt \
  --characters chars.txt \
  --style_images "style_images/*.png" \
  --output_dir results/

Outputs & metadata

Repo uses hash-based filenames (tools/filename_utils.py) and a central metadata file:

  • ContentImage/char.png β€” character content images
  • TargetImage/style+char.png β€” generated images per style
  • results_checkpoint.json β€” canonical metadata used by dataset tools and HF exporters

Example metadata generation:

python tools/generate_metadata.py --data_root my_dataset/handwritten_original --output my_dataset/handwritten_original/results_checkpoint.json

Model Performance

Supported Tasks

  • Single-character font generation
  • Multi-character batch generation
  • Multi-font support
  • Multi-style transfer
  • Index-based tracking for large-scale generation
  • Checkpoint and resume support

Output Format

output_dir/
β”œβ”€β”€ ContentImage/              # Single set of content (character) images
β”‚   β”œβ”€β”€ char0.png
β”‚   β”œβ”€β”€ char1.png
β”‚   └── ...
β”œβ”€β”€ TargetImage/               # Generated font images organized by style
β”‚   β”œβ”€β”€ style0/
β”‚   β”‚   β”œβ”€β”€ style0+char0.png
β”‚   β”‚   β”œβ”€β”€ style0+char1.png
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ style1/
β”‚   β”‚   └── ...
β”‚   └── ...
β”œβ”€β”€ results_checkpoint.json    # Checkpoint act as generation metadata

Results Metadata Structure

{
  "generations": [
    {
      "character": "A",
      "char_index": 0,
      "style": "style0",
      "style_index": 0,
      "font": "Arial",
      "style_path": "path/to/style0.png",
      "output_path": "TargetImage/style0/style0+char0.png"
    }
  ],
  "metrics": {
    "lpips": {"mean": 0.25, "std": 0.08, "min": 0.1, "max": 0.5},
    "ssim": {"mean": 0.82, "std": 0.05, "min": 0.7, "max": 0.95},
    "fid": {"mean": 15.3, "std": 2.1},
    "inference_times": [
      {
        "style": "style0",
        "style_index": 0,
        "font": "Arial",
        "total_time": 2.45,
        "num_images": 100,
        "time_per_image": 0.0245
      }
    ]
  },
  "fonts": ["Arial", "Times New Roman"],
  "characters": ["A", "B", "C"],
  "styles": ["style0", "style1"],
  "total_chars": 3,
  "total_styles": 2,
  "total_possible_pairs": 6
}

Dataset

Dataset Source

Dataset Structure

FontDiffusion Dataset/
β”œβ”€β”€ total/
β”‚   β”œβ”€β”€ ContentImage/          # Character structure images
β”‚   β”œβ”€β”€ TargetImage/           # Style-specific font renderings
β”‚   └── results_checkpoint.json
β”œβ”€β”€ val/
└── test/

Technical Features

Optimizations

  • Batch Processing: Process multiple characters per style
  • Memory Efficiency: Attention slicing (optional)
  • FP16 Support: Reduced precision for faster inference
  • Torch Compile: Optional model compilation
  • Channels Last Format: Memory-optimized tensor layout
  • XFormers Support: Fast attention implementation

Robustness

  • Checkpoint & Resume: Resume from interruptions
  • Index-based Tracking: Handle large character sets (100K+)
  • Multi-font Support: Process characters across multiple fonts
  • Error Recovery: Graceful handling of missing fonts
  • Automatic Indexing: Consistent char_index and style_index

Monitoring

  • Weights & Biases Integration: Real-time tracking
  • Progress Bars: Detailed generation progress
  • Checkpoint Saving: Periodic intermediate saves
  • Quality Metrics: LPIPS, SSIM, FID computation

Citation

@article{fontdiffuser2023,
  title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
  author={Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin},
  year={2023}
}

License

This model is licensed under the Apache License 2.0. See LICENSE file for details.