Update README.md

e4751b7 verified 13 days ago

8.17 kB

	---
	license: apache-2.0
	datasets:
	- dzungpham/font-diffusion-generated-data
	language:
	- en
	library_name: diffusers
	tags:
	- font-diffusion
	- image-to-image
	- contrastive-learning
	- diffusers
	- font-generation
	- character-synthesis
	- style-transfer
	- dpm-solver
	---
	# Model Card for FontDiffuser

	## Model Details

	### Model Type
	- Architecture: Diffusion-based Font Generation Model
	- Framework: PyTorch + Hugging Face Diffusers
	- Scheduler: DPM-Solver++ (configurable: dpmsolver++ / dpmsolver)
	- Guidance: Classifier-free guidance
	- Base Model: FontDiffuser with Content and Style Encoders

	### Model Components
	1. UNet: Main diffusion model for image generation
	2. Content Encoder: Extracts character structure information
	3. Style Encoder: Extracts font style features
	4. DDPM/DPM Scheduler: Noise scheduling for diffusion process

	### Training Configuration
	- Resolution: 96×96 pixels
	- Batch Size: 4-8 (configurable)
	- Inference Steps: 15 (default, configurable)
	- Guidance Scale: 7.5 (default, configurable)
	- Precision: FP32/FP16 (optional)
	- Device: CUDA/GPU recommended

	## Model Usage

	### Installation
	```bash
	pip install diffusers torch torchvision safetensors
	pip install lpips scikit-image pytorch-fid # Optional: for evaluation
	```

	### Basic Generation
	```python
	from sample_batch import (
	FontManager,
	batch_generate_images,
	load_fontdiffuser_pipeline
	)
	from argparse import Namespace

	# Initialize font manager
	font_manager = FontManager("path/to/font.ttf")

	# Load pipeline
	args = Namespace(
	ckpt_dir="path/to/checkpoints",
	device="cuda",
	num_inference_steps=15,
	guidance_scale=7.5,
	batch_size=4,
	# ... other args
	)
	pipe = load_fontdiffuser_pipeline(args)

	# Generate images
	characters = ['A', 'B', 'C', '中', '国']
	style_paths = ['style1.png', 'style2.png']

	results = batch_generate_images(
	pipe, characters, style_paths,
	output_dir="output",
	args=args,
	evaluator=evaluator,
	font_manager=font_manager
	)
	```

	### Batch Generation with Checkpointing
	```bash
	python sample_batch.py \
	--characters "characters.txt" \
	--start_line 1 \
	--end_line 100 \
	--style_images "styles/" \
	--ttf_path "fonts/myfont.ttf" \
	--ckpt_dir "checkpoints/" \
	--output_dir "my_dataset/train_original" \
	--batch_size 4 \
	--num_inference_steps 15 \
	--guidance_scale 7.5 \
	--save_interval 10 \
	--device cuda
	```

	### Resume from Checkpoint
	```bash
	python sample_batch.py \
	--characters "characters.txt" \
	--style_images "styles/" \
	--ttf_path "fonts/myfont.ttf" \
	--ckpt_dir "checkpoints/" \
	--output_dir "my_dataset/train_original" \
	--resume_from "my_dataset/train_original/results_checkpoint.json"
	```

	## Model Performance

	### Supported Tasks
	- ✅ Single-character font generation
	- ✅ Multi-character batch generation
	- ✅ Multi-font support
	- ✅ Multi-style transfer
	- ✅ Index-based tracking for large-scale generation
	- ✅ Checkpoint and resume support

	### Output Format
	```
	output_dir/
	├── ContentImage/ # Single set of content (character) images
	│ ├── char0.png
	│ ├── char1.png
	│ └── ...
	├── TargetImage/ # Generated font images organized by style
	│ ├── style0/
	│ │ ├── style0+char0.png
	│ │ ├── style0+char1.png
	│ │ └── ...
	│ ├── style1/
	│ │ └── ...
	│ └── ...
	├── results.json # Comprehensive generation metadata
	├── results_checkpoint.json # Intermediate checkpoint (if save_interval > 0)
	└── results_interrupted.json # Emergency checkpoint (if interrupted)
	```

	### Results Metadata Structure
	```json
	{
	"generations": [
	{
	"character": "A",
	"char_index": 0,
	"style": "style0",
	"style_index": 0,
	"font": "Arial",
	"style_path": "path/to/style0.png",
	"output_path": "TargetImage/style0/style0+char0.png"
	}
	],
	"metrics": {
	"lpips": {"mean": 0.25, "std": 0.08, "min": 0.1, "max": 0.5},
	"ssim": {"mean": 0.82, "std": 0.05, "min": 0.7, "max": 0.95},
	"fid": {"mean": 15.3, "std": 2.1},
	"inference_times": [
	{
	"style": "style0",
	"style_index": 0,
	"font": "Arial",
	"total_time": 2.45,
	"num_images": 100,
	"time_per_image": 0.0245
	}
	]
	},
	"fonts": ["Arial", "Times New Roman"],
	"characters": ["A", "B", "C"],
	"styles": ["style0", "style1"],
	"total_chars": 3,
	"total_styles": 2,
	"total_possible_pairs": 6
	}
	```

	## Evaluation Metrics

	### Supported Metrics
	- LPIPS: Learned perceptual image patch similarity (lower is better)
	- SSIM: Structural similarity index (higher is better)
	- FID: Fréchet Inception Distance (lower is better)
	- Inference Time: Per-image generation time

	### Generate with Evaluation
	```bash
	python sample_batch.py \
	--characters "characters.txt" \
	--style_images "styles/" \
	--ttf_path "fonts/myfont.ttf" \
	--ckpt_dir "checkpoints/" \
	--output_dir "my_dataset/train_original" \
	--evaluate \
	--ground_truth_dir "ground_truth/" \
	--compute_fid
	```

	## Dataset

	### Dataset Source
	- Name: font-diffusion-generated-data
	- Link: https://huggingface.co/datasets/dzungpham/font-diffusion-generated-data
	- Format: ContentImage + TargetImage per style
	- Supports: Multi-font, multi-character, multi-style generation

	### Dataset Structure
	```
	FontDiffusion Dataset/
	├── train_original/
	│ ├── ContentImage/ # Character structure images
	│ ├── TargetImage/ # Style-specific font renderings
	│ └── results.json
	├── val_original/
	└── test_original/
	```

	## Training & Fine-tuning

	### Fine-tuning from Checkpoint
	```bash
	python my_train.py \
	--ckpt_dir "checkpoints/" \
	--data_dir "my_dataset/train_original" \
	--output_dir "finetuned_ckpt/" \
	--num_epochs 5 \
	--learning_rate 1e-4 \
	--batch_size 4
	```

	### Convert & Upload Fine-tuned Models
	```bash
	python finetune_and_upload.py \
	--ckpt_dir "finetuned_ckpt/" \
	--hf_token "hf_xxxxx" \
	--hf_repo_id "username/font-diffusion-finetuned" \
	--num_epochs 5
	```

	## Technical Features

	### Optimizations
	- ✅ Batch Processing: Process multiple characters per style
	- ✅ Memory Efficiency: Attention slicing (optional)
	- ✅ FP16 Support: Reduced precision for faster inference
	- ✅ Torch Compile: Optional model compilation
	- ✅ Channels Last Format: Memory-optimized tensor layout
	- ✅ XFormers Support: Fast attention implementation

	### Robustness
	- ✅ Checkpoint & Resume: Resume from interruptions
	- ✅ Index-based Tracking: Handle large character sets (100K+)
	- ✅ Multi-font Support: Process characters across multiple fonts
	- ✅ Error Recovery: Graceful handling of missing fonts
	- ✅ Automatic Indexing: Consistent char_index and style_index

	### Monitoring
	- ✅ Weights & Biases Integration: Real-time tracking
	- ✅ Progress Bars: Detailed generation progress
	- ✅ Checkpoint Saving: Periodic intermediate saves
	- ✅ Quality Metrics: LPIPS, SSIM, FID computation

	## Known Limitations

	- Requires CUDA-capable GPU for practical generation speeds
	- Characters must exist in at least one loaded font
	- Style images should be normalized (96×96 or resizable)
	- Very large character sets (>100K) may require memory optimization
	- FID computation requires representative ground truth dataset

	## Citation

	```bibtex
	@article{fontdiffuser2023,
	title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
	author={Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin},
	year={2023}
	}
	```

	## License

	This model is licensed under the Apache License 2.0. See LICENSE file for details.

	## Contact & Support

	For issues, questions, or contributions:
	- GitHub: [FontDiffusion Repository]
	- Hugging Face: [Model Card]
	- Dataset: https://huggingface.co/datasets/dzungpham/font-diffusion-generated-data

	---