dzungpham/FontTransfer
Viewer β’ Updated β’ 632k β’ 131 β’ 2
How to use dzungpham/font-architect with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("dzungpham/font-architect", dtype=torch.bfloat16, device_map="cuda")
prompt = "Turn this cat into a dog"
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
image = pipe(image=input_image, prompt=prompt).images[0]The installation utilize uv package manager for its high speed due to implementation in Rust
uv pip install diffusers torch torchvision safetensors
uv pip install lpips scikit-image pytorch-fid # Optional: for evaluation
from argparse import Namespace
from inference.sample_optimized import load_fontdiffuser_pipeline
args = Namespace(
ckpt_dir="ckpt",
device="cuda:0",
guidance_scale=7.5,
num_inference_steps=20,
fp16=False,
enable_xformers=False,
)
pipe = load_fontdiffuser_pipeline(args=args)
accelerate launch run_inference.py \
--ckpt_dir ckpt \
--content_character "A" \
--style_image_path style_images/foo.png \
--save_image \
--save_image_dir results/
accelerate launch run_inference.py \
--ckpt_dir ckpt \
--characters chars.txt \
--style_images "style_images/*.png" \
--ttf_path fonts/myfont.ttf \
--output_dir my_dataset/train_original \
--batch_size 8 \
--num_inference_steps 15 \
--guidance_scale 7.5 \
--save_interval 10
accelerate launch run_inference.py \
--ckpt_dir ckpt \
--characters chars.txt \
--style_images "style_images/*.png" \
--output_dir results/
Repo uses hash-based filenames (tools/filename_utils.py) and a central metadata file:
Example metadata generation:
python tools/generate_metadata.py --data_root my_dataset/handwritten_original --output my_dataset/handwritten_original/results_checkpoint.json
output_dir/
βββ ContentImage/ # Single set of content (character) images
β βββ char0.png
β βββ char1.png
β βββ ...
βββ TargetImage/ # Generated font images organized by style
β βββ style0/
β β βββ style0+char0.png
β β βββ style0+char1.png
β β βββ ...
β βββ style1/
β β βββ ...
β βββ ...
βββ results_checkpoint.json # Checkpoint act as generation metadata
{
"generations": [
{
"character": "A",
"char_index": 0,
"style": "style0",
"style_index": 0,
"font": "Arial",
"style_path": "path/to/style0.png",
"output_path": "TargetImage/style0/style0+char0.png"
}
],
"metrics": {
"lpips": {"mean": 0.25, "std": 0.08, "min": 0.1, "max": 0.5},
"ssim": {"mean": 0.82, "std": 0.05, "min": 0.7, "max": 0.95},
"fid": {"mean": 15.3, "std": 2.1},
"inference_times": [
{
"style": "style0",
"style_index": 0,
"font": "Arial",
"total_time": 2.45,
"num_images": 100,
"time_per_image": 0.0245
}
]
},
"fonts": ["Arial", "Times New Roman"],
"characters": ["A", "B", "C"],
"styles": ["style0", "style1"],
"total_chars": 3,
"total_styles": 2,
"total_possible_pairs": 6
}
FontDiffusion Dataset/
βββ total/
β βββ ContentImage/ # Character structure images
β βββ TargetImage/ # Style-specific font renderings
β βββ results_checkpoint.json
βββ val/
βββ test/
@article{fontdiffuser2023,
title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
author={Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin},
year={2023}
}
This model is licensed under the Apache License 2.0. See LICENSE file for details.