dzungpham
/

font-diffusion-weights

 - image-to-image
 - contrastive-learning
 - diffusers
+- font-generation
+- character-synthesis
+- style-transfer
+- dpm-solver
+---
+# Model Card for FontDiffuser
+## Model Details
+### Model Type
+- **Architecture**: Diffusion-based Font Generation Model
+- **Framework**: PyTorch + Hugging Face Diffusers
+- **Scheduler**: DPM-Solver++ (configurable: dpmsolver++ / dpmsolver)
+- **Guidance**: Classifier-free guidance
+- **Base Model**: FontDiffuser with Content and Style Encoders
+### Model Components
+1. **UNet**: Main diffusion model for image generation
+2. **Content Encoder**: Extracts character structure information
+3. **Style Encoder**: Extracts font style features
+4. **DDPM/DPM Scheduler**: Noise scheduling for diffusion process
+### Training Configuration
+- **Resolution**: 96×96 pixels
+- **Batch Size**: 4-8 (configurable)
+- **Inference Steps**: 15 (default, configurable)
+- **Guidance Scale**: 7.5 (default, configurable)
+- **Precision**: FP32/FP16 (optional)
+- **Device**: CUDA/GPU recommended
+## Model Usage
+### Installation
+```bash
+pip install diffusers torch torchvision safetensors
+pip install lpips scikit-image pytorch-fid  # Optional: for evaluation
+```
+### Basic Generation
+```python
+from sample_batch import (
+    FontManager,
+    batch_generate_images,
+    load_fontdiffuser_pipeline
+)
+from argparse import Namespace
+# Initialize font manager
+font_manager = FontManager("path/to/font.ttf")
+# Load pipeline
+args = Namespace(
+    ckpt_dir="path/to/checkpoints",
+    device="cuda",
+    num_inference_steps=15,
+    guidance_scale=7.5,
+    batch_size=4,
+    # ... other args
+)
+pipe = load_fontdiffuser_pipeline(args)
+# Generate images
+characters = ['A', 'B', 'C', '中', '国']
+style_paths = ['style1.png', 'style2.png']
+results = batch_generate_images(
+    pipe, characters, style_paths,
+    output_dir="output",
+    args=args,
+    evaluator=evaluator,
+    font_manager=font_manager
+)
+```
+### Batch Generation with Checkpointing
+```bash
+python sample_batch.py \
+  --characters "characters.txt" \
+  --start_line 1 \
+  --end_line 100 \
+  --style_images "styles/" \
+  --ttf_path "fonts/myfont.ttf" \
+  --ckpt_dir "checkpoints/" \
+  --output_dir "my_dataset/train_original" \
+  --batch_size 4 \
+  --num_inference_steps 15 \
+  --guidance_scale 7.5 \
+  --save_interval 10 \
+  --device cuda
+```
+### Resume from Checkpoint
+```bash
+python sample_batch.py \
+  --characters "characters.txt" \
+  --style_images "styles/" \
+  --ttf_path "fonts/myfont.ttf" \
+  --ckpt_dir "checkpoints/" \
+  --output_dir "my_dataset/train_original" \
+  --resume_from "my_dataset/train_original/results_checkpoint.json"
+```
+## Model Performance
+### Supported Tasks
+- ✅ Single-character font generation
+- ✅ Multi-character batch generation
+- ✅ Multi-font support
+- ✅ Multi-style transfer
+- ✅ Index-based tracking for large-scale generation
+- ✅ Checkpoint and resume support
+### Output Format
+```
+output_dir/
+├── ContentImage/              # Single set of content (character) images
+│   ├── char0.png
+│   ├── char1.png
+│   └── ...
+├── TargetImage/               # Generated font images organized by style
+│   ├── style0/
+│   │   ├── style0+char0.png
+│   │   ├── style0+char1.png
+│   │   └── ...
+│   ├── style1/
+│   │   └── ...
+│   └── ...
+├── results.json               # Comprehensive generation metadata
+├── results_checkpoint.json    # Intermediate checkpoint (if save_interval > 0)
+└── results_interrupted.json   # Emergency checkpoint (if interrupted)
+```
+### Results Metadata Structure
+```json
+{
+  "generations": [
+    {
+      "character": "A",
+      "char_index": 0,
+      "style": "style0",
+      "style_index": 0,
+      "font": "Arial",
+      "style_path": "path/to/style0.png",
+      "output_path": "TargetImage/style0/style0+char0.png"
+    }
+  ],
+  "metrics": {
+    "lpips": {"mean": 0.25, "std": 0.08, "min": 0.1, "max": 0.5},
+    "ssim": {"mean": 0.82, "std": 0.05, "min": 0.7, "max": 0.95},
+    "fid": {"mean": 15.3, "std": 2.1},
+    "inference_times": [
+      {
+        "style": "style0",
+        "style_index": 0,
+        "font": "Arial",
+        "total_time": 2.45,
+        "num_images": 100,
+        "time_per_image": 0.0245
+      }
+    ]
+  },
+  "fonts": ["Arial", "Times New Roman"],
+  "characters": ["A", "B", "C"],
+  "styles": ["style0", "style1"],
+  "total_chars": 3,
+  "total_styles": 2,
+  "total_possible_pairs": 6
+}
+```
+## Evaluation Metrics
+### Supported Metrics
+- **LPIPS**: Learned perceptual image patch similarity (lower is better)
+- **SSIM**: Structural similarity index (higher is better)
+- **FID**: Fréchet Inception Distance (lower is better)
+- **Inference Time**: Per-image generation time
+### Generate with Evaluation
+```bash
+python sample_batch.py \
+  --characters "characters.txt" \
+  --style_images "styles/" \
+  --ttf_path "fonts/myfont.ttf" \
+  --ckpt_dir "checkpoints/" \
+  --output_dir "my_dataset/train_original" \
+  --evaluate \
+  --ground_truth_dir "ground_truth/" \
+  --compute_fid
+```
+## Dataset
+### Dataset Source
+- **Name**: font-diffusion-generated-data
+- **Link**: https://huggingface.co/datasets/dzungpham/font-diffusion-generated-data
+- **Format**: ContentImage + TargetImage per style
+- **Supports**: Multi-font, multi-character, multi-style generation
+### Dataset Structure
+```
+FontDiffusion Dataset/
+├── train_original/
+│   ├── ContentImage/          # Character structure images
+│   ├── TargetImage/           # Style-specific font renderings
+│   └── results.json
+├── val_original/
+└── test_original/
+```
+## Training & Fine-tuning
+### Fine-tuning from Checkpoint
+```bash
+python my_train.py \
+  --ckpt_dir "checkpoints/" \
+  --data_dir "my_dataset/train_original" \
+  --output_dir "finetuned_ckpt/" \
+  --num_epochs 5 \
+  --learning_rate 1e-4 \
+  --batch_size 4
+```
+### Convert & Upload Fine-tuned Models
+```bash
+python finetune_and_upload.py \
+  --ckpt_dir "finetuned_ckpt/" \
+  --hf_token "hf_xxxxx" \
+  --hf_repo_id "username/font-diffusion-finetuned" \
+  --num_epochs 5
+```
+## Technical Features
+### Optimizations
+- ✅ **Batch Processing**: Process multiple characters per style
+- ✅ **Memory Efficiency**: Attention slicing (optional)
+- ✅ **FP16 Support**: Reduced precision for faster inference
+- ✅ **Torch Compile**: Optional model compilation
+- ✅ **Channels Last Format**: Memory-optimized tensor layout
+- ✅ **XFormers Support**: Fast attention implementation
+### Robustness
+- ✅ **Checkpoint & Resume**: Resume from interruptions
+- ✅ **Index-based Tracking**: Handle large character sets (100K+)
+- ✅ **Multi-font Support**: Process characters across multiple fonts
+- ✅ **Error Recovery**: Graceful handling of missing fonts
+- ✅ **Automatic Indexing**: Consistent char_index and style_index
+### Monitoring
+- ✅ **Weights & Biases Integration**: Real-time tracking
+- ✅ **Progress Bars**: Detailed generation progress
+- ✅ **Checkpoint Saving**: Periodic intermediate saves
+- ✅ **Quality Metrics**: LPIPS, SSIM, FID computation
+## Known Limitations
+- Requires CUDA-capable GPU for practical generation speeds
+- Characters must exist in at least one loaded font
+- Style images should be normalized (96×96 or resizable)
+- Very large character sets (>100K) may require memory optimization
+- FID computation requires representative ground truth dataset
+## Citation
+```bibtex
+@article{fontdiffuser2023,
+  title={FontDiffuser: One-Shot Font Generation via Diffusion},
+  author={Pham, Dzung and others},
+  year={2023}
+}
+```
+## License
+This model is licensed under the Apache License 2.0. See LICENSE file for details.
+## Contact & Support
+For issues, questions, or contributions:
+- GitHub: [FontDiffusion Repository]
+- Hugging Face: [Model Card]
+- Dataset: https://huggingface.co/datasets/dzungpham/font-diffusion-generated-data
 ---