dzungpham commited on
Commit
672795c
Β·
verified Β·
1 Parent(s): f4d00e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +284 -0
README.md CHANGED
@@ -10,4 +10,288 @@ tags:
10
  - image-to-image
11
  - contrastive-learning
12
  - diffusers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
 
10
  - image-to-image
11
  - contrastive-learning
12
  - diffusers
13
+ - font-generation
14
+ - character-synthesis
15
+ - style-transfer
16
+ - dpm-solver
17
+ ---
18
+ # Model Card for FontDiffuser
19
+
20
+ ## Model Details
21
+
22
+ ### Model Type
23
+ - **Architecture**: Diffusion-based Font Generation Model
24
+ - **Framework**: PyTorch + Hugging Face Diffusers
25
+ - **Scheduler**: DPM-Solver++ (configurable: dpmsolver++ / dpmsolver)
26
+ - **Guidance**: Classifier-free guidance
27
+ - **Base Model**: FontDiffuser with Content and Style Encoders
28
+
29
+ ### Model Components
30
+ 1. **UNet**: Main diffusion model for image generation
31
+ 2. **Content Encoder**: Extracts character structure information
32
+ 3. **Style Encoder**: Extracts font style features
33
+ 4. **DDPM/DPM Scheduler**: Noise scheduling for diffusion process
34
+
35
+ ### Training Configuration
36
+ - **Resolution**: 96Γ—96 pixels
37
+ - **Batch Size**: 4-8 (configurable)
38
+ - **Inference Steps**: 15 (default, configurable)
39
+ - **Guidance Scale**: 7.5 (default, configurable)
40
+ - **Precision**: FP32/FP16 (optional)
41
+ - **Device**: CUDA/GPU recommended
42
+
43
+ ## Model Usage
44
+
45
+ ### Installation
46
+ ```bash
47
+ pip install diffusers torch torchvision safetensors
48
+ pip install lpips scikit-image pytorch-fid # Optional: for evaluation
49
+ ```
50
+
51
+ ### Basic Generation
52
+ ```python
53
+ from sample_batch import (
54
+ FontManager,
55
+ batch_generate_images,
56
+ load_fontdiffuser_pipeline
57
+ )
58
+ from argparse import Namespace
59
+
60
+ # Initialize font manager
61
+ font_manager = FontManager("path/to/font.ttf")
62
+
63
+ # Load pipeline
64
+ args = Namespace(
65
+ ckpt_dir="path/to/checkpoints",
66
+ device="cuda",
67
+ num_inference_steps=15,
68
+ guidance_scale=7.5,
69
+ batch_size=4,
70
+ # ... other args
71
+ )
72
+ pipe = load_fontdiffuser_pipeline(args)
73
+
74
+ # Generate images
75
+ characters = ['A', 'B', 'C', 'δΈ­', 'ε›½']
76
+ style_paths = ['style1.png', 'style2.png']
77
+
78
+ results = batch_generate_images(
79
+ pipe, characters, style_paths,
80
+ output_dir="output",
81
+ args=args,
82
+ evaluator=evaluator,
83
+ font_manager=font_manager
84
+ )
85
+ ```
86
+
87
+ ### Batch Generation with Checkpointing
88
+ ```bash
89
+ python sample_batch.py \
90
+ --characters "characters.txt" \
91
+ --start_line 1 \
92
+ --end_line 100 \
93
+ --style_images "styles/" \
94
+ --ttf_path "fonts/myfont.ttf" \
95
+ --ckpt_dir "checkpoints/" \
96
+ --output_dir "my_dataset/train_original" \
97
+ --batch_size 4 \
98
+ --num_inference_steps 15 \
99
+ --guidance_scale 7.5 \
100
+ --save_interval 10 \
101
+ --device cuda
102
+ ```
103
+
104
+ ### Resume from Checkpoint
105
+ ```bash
106
+ python sample_batch.py \
107
+ --characters "characters.txt" \
108
+ --style_images "styles/" \
109
+ --ttf_path "fonts/myfont.ttf" \
110
+ --ckpt_dir "checkpoints/" \
111
+ --output_dir "my_dataset/train_original" \
112
+ --resume_from "my_dataset/train_original/results_checkpoint.json"
113
+ ```
114
+
115
+ ## Model Performance
116
+
117
+ ### Supported Tasks
118
+ - βœ… Single-character font generation
119
+ - βœ… Multi-character batch generation
120
+ - βœ… Multi-font support
121
+ - βœ… Multi-style transfer
122
+ - βœ… Index-based tracking for large-scale generation
123
+ - βœ… Checkpoint and resume support
124
+
125
+ ### Output Format
126
+ ```
127
+ output_dir/
128
+ β”œβ”€β”€ ContentImage/ # Single set of content (character) images
129
+ β”‚ β”œβ”€β”€ char0.png
130
+ β”‚ β”œβ”€β”€ char1.png
131
+ β”‚ └── ...
132
+ β”œβ”€β”€ TargetImage/ # Generated font images organized by style
133
+ β”‚ β”œβ”€β”€ style0/
134
+ β”‚ β”‚ β”œβ”€β”€ style0+char0.png
135
+ β”‚ β”‚ β”œβ”€β”€ style0+char1.png
136
+ β”‚ β”‚ └── ...
137
+ β”‚ β”œβ”€β”€ style1/
138
+ β”‚ β”‚ └── ...
139
+ β”‚ └── ...
140
+ β”œβ”€β”€ results.json # Comprehensive generation metadata
141
+ β”œβ”€β”€ results_checkpoint.json # Intermediate checkpoint (if save_interval > 0)
142
+ └── results_interrupted.json # Emergency checkpoint (if interrupted)
143
+ ```
144
+
145
+ ### Results Metadata Structure
146
+ ```json
147
+ {
148
+ "generations": [
149
+ {
150
+ "character": "A",
151
+ "char_index": 0,
152
+ "style": "style0",
153
+ "style_index": 0,
154
+ "font": "Arial",
155
+ "style_path": "path/to/style0.png",
156
+ "output_path": "TargetImage/style0/style0+char0.png"
157
+ }
158
+ ],
159
+ "metrics": {
160
+ "lpips": {"mean": 0.25, "std": 0.08, "min": 0.1, "max": 0.5},
161
+ "ssim": {"mean": 0.82, "std": 0.05, "min": 0.7, "max": 0.95},
162
+ "fid": {"mean": 15.3, "std": 2.1},
163
+ "inference_times": [
164
+ {
165
+ "style": "style0",
166
+ "style_index": 0,
167
+ "font": "Arial",
168
+ "total_time": 2.45,
169
+ "num_images": 100,
170
+ "time_per_image": 0.0245
171
+ }
172
+ ]
173
+ },
174
+ "fonts": ["Arial", "Times New Roman"],
175
+ "characters": ["A", "B", "C"],
176
+ "styles": ["style0", "style1"],
177
+ "total_chars": 3,
178
+ "total_styles": 2,
179
+ "total_possible_pairs": 6
180
+ }
181
+ ```
182
+
183
+ ## Evaluation Metrics
184
+
185
+ ### Supported Metrics
186
+ - **LPIPS**: Learned perceptual image patch similarity (lower is better)
187
+ - **SSIM**: Structural similarity index (higher is better)
188
+ - **FID**: FrΓ©chet Inception Distance (lower is better)
189
+ - **Inference Time**: Per-image generation time
190
+
191
+ ### Generate with Evaluation
192
+ ```bash
193
+ python sample_batch.py \
194
+ --characters "characters.txt" \
195
+ --style_images "styles/" \
196
+ --ttf_path "fonts/myfont.ttf" \
197
+ --ckpt_dir "checkpoints/" \
198
+ --output_dir "my_dataset/train_original" \
199
+ --evaluate \
200
+ --ground_truth_dir "ground_truth/" \
201
+ --compute_fid
202
+ ```
203
+
204
+ ## Dataset
205
+
206
+ ### Dataset Source
207
+ - **Name**: font-diffusion-generated-data
208
+ - **Link**: https://huggingface.co/datasets/dzungpham/font-diffusion-generated-data
209
+ - **Format**: ContentImage + TargetImage per style
210
+ - **Supports**: Multi-font, multi-character, multi-style generation
211
+
212
+ ### Dataset Structure
213
+ ```
214
+ FontDiffusion Dataset/
215
+ β”œβ”€β”€ train_original/
216
+ β”‚ β”œβ”€β”€ ContentImage/ # Character structure images
217
+ β”‚ β”œβ”€β”€ TargetImage/ # Style-specific font renderings
218
+ β”‚ └── results.json
219
+ β”œβ”€β”€ val_original/
220
+ └── test_original/
221
+ ```
222
+
223
+ ## Training & Fine-tuning
224
+
225
+ ### Fine-tuning from Checkpoint
226
+ ```bash
227
+ python my_train.py \
228
+ --ckpt_dir "checkpoints/" \
229
+ --data_dir "my_dataset/train_original" \
230
+ --output_dir "finetuned_ckpt/" \
231
+ --num_epochs 5 \
232
+ --learning_rate 1e-4 \
233
+ --batch_size 4
234
+ ```
235
+
236
+ ### Convert & Upload Fine-tuned Models
237
+ ```bash
238
+ python finetune_and_upload.py \
239
+ --ckpt_dir "finetuned_ckpt/" \
240
+ --hf_token "hf_xxxxx" \
241
+ --hf_repo_id "username/font-diffusion-finetuned" \
242
+ --num_epochs 5
243
+ ```
244
+
245
+ ## Technical Features
246
+
247
+ ### Optimizations
248
+ - βœ… **Batch Processing**: Process multiple characters per style
249
+ - βœ… **Memory Efficiency**: Attention slicing (optional)
250
+ - βœ… **FP16 Support**: Reduced precision for faster inference
251
+ - βœ… **Torch Compile**: Optional model compilation
252
+ - βœ… **Channels Last Format**: Memory-optimized tensor layout
253
+ - βœ… **XFormers Support**: Fast attention implementation
254
+
255
+ ### Robustness
256
+ - βœ… **Checkpoint & Resume**: Resume from interruptions
257
+ - βœ… **Index-based Tracking**: Handle large character sets (100K+)
258
+ - βœ… **Multi-font Support**: Process characters across multiple fonts
259
+ - βœ… **Error Recovery**: Graceful handling of missing fonts
260
+ - βœ… **Automatic Indexing**: Consistent char_index and style_index
261
+
262
+ ### Monitoring
263
+ - βœ… **Weights & Biases Integration**: Real-time tracking
264
+ - βœ… **Progress Bars**: Detailed generation progress
265
+ - βœ… **Checkpoint Saving**: Periodic intermediate saves
266
+ - βœ… **Quality Metrics**: LPIPS, SSIM, FID computation
267
+
268
+ ## Known Limitations
269
+
270
+ - Requires CUDA-capable GPU for practical generation speeds
271
+ - Characters must exist in at least one loaded font
272
+ - Style images should be normalized (96Γ—96 or resizable)
273
+ - Very large character sets (>100K) may require memory optimization
274
+ - FID computation requires representative ground truth dataset
275
+
276
+ ## Citation
277
+
278
+ ```bibtex
279
+ @article{fontdiffuser2023,
280
+ title={FontDiffuser: One-Shot Font Generation via Diffusion},
281
+ author={Pham, Dzung and others},
282
+ year={2023}
283
+ }
284
+ ```
285
+
286
+ ## License
287
+
288
+ This model is licensed under the Apache License 2.0. See LICENSE file for details.
289
+
290
+ ## Contact & Support
291
+
292
+ For issues, questions, or contributions:
293
+ - GitHub: [FontDiffusion Repository]
294
+ - Hugging Face: [Model Card]
295
+ - Dataset: https://huggingface.co/datasets/dzungpham/font-diffusion-generated-data
296
+
297
  ---