Instructions to use dchen0/font-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dchen0/font-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="dchen0/font-classifier") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoImageProcessor, AutoModelForImageClassification processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier") model = AutoModelForImageClassification.from_pretrained("dchen0/font-classifier") - Notebooks
- Google Colab
- Kaggle
Replace auto-generated model card with detailed description
Browse files
README.md
CHANGED
|
@@ -1,115 +1,101 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: image-classification
|
| 4 |
-
library_name: transformers
|
| 5 |
tags:
|
| 6 |
- dinov2
|
| 7 |
- image-classification
|
| 8 |
- fonts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
#
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
## Training
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
| 51 |
-
|
|
| 52 |
-
|
|
| 53 |
-
|
|
| 54 |
-
|
|
| 55 |
-
|
|
| 56 |
-
|
|
| 57 |
-
|
|
| 58 |
-
|
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
| 0.4032 | 0.8358 | 2300 | 0.2990 | 0.0016 | 0.9019 |
|
| 98 |
-
| 0.3825 | 0.8539 | 2350 | 0.2938 | 0.0016 | 0.9062 |
|
| 99 |
-
| 0.345 | 0.8721 | 2400 | 0.2871 | 0.0016 | 0.9059 |
|
| 100 |
-
| 0.3528 | 0.8903 | 2450 | 0.2777 | 0.0016 | 0.9093 |
|
| 101 |
-
| 0.3207 | 0.9084 | 2500 | 0.2764 | 0.0016 | 0.9111 |
|
| 102 |
-
| 0.2664 | 0.9266 | 2550 | 0.2741 | 0.0016 | 0.9099 |
|
| 103 |
-
| 0.3496 | 0.9448 | 2600 | 0.2720 | 0.0016 | 0.9151 |
|
| 104 |
-
| 0.3274 | 0.9629 | 2650 | 0.2724 | 0.0016 | 0.9136 |
|
| 105 |
-
| 0.3014 | 0.9811 | 2700 | 0.2659 | 0.0016 | 0.9136 |
|
| 106 |
-
| 0.3235 | 0.9993 | 2750 | 0.2637 | 0.0016 | 0.9163 |
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
### Framework versions
|
| 110 |
-
|
| 111 |
-
- PEFT 0.15.2
|
| 112 |
-
- Transformers 4.52.4
|
| 113 |
-
- Pytorch 2.7.1
|
| 114 |
-
- Datasets 3.6.0
|
| 115 |
-
- Tokenizers 0.21.1
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: image-classification
|
| 4 |
+
library_name: transformers
|
| 5 |
tags:
|
| 6 |
- dinov2
|
| 7 |
- image-classification
|
| 8 |
- fonts
|
| 9 |
+
- lora
|
| 10 |
+
- vision-transformer
|
| 11 |
+
datasets:
|
| 12 |
+
- dchen0/font_crops_v5
|
| 13 |
+
base_model: facebook/dinov2-base-imagenet1k-1-layer
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# Font Classifier
|
| 17 |
+
|
| 18 |
+
A DINOv2 Vision Transformer fine-tuned with LoRA for font classification across 394 font variants from 32 Google Fonts families.
|
| 19 |
+
|
| 20 |
+
## How it was made
|
| 21 |
+
|
| 22 |
+
1. **Base model**: [facebook/dinov2-base-imagenet1k-1-layer](https://huggingface.co/facebook/dinov2-base-imagenet1k-1-layer) (87.2M parameters, frozen).
|
| 23 |
+
2. **Fine-tuning**: [LoRA](https://arxiv.org/abs/2106.09685) (rank 8, alpha 16) applied to the query and value projections in each ViT attention block, plus a trainable classification head. ~900K trainable parameters (1% of total).
|
| 24 |
+
3. **Merge**: After training, the LoRA adapter weights were merged into the base model (`merge_and_unload()`), producing this standalone checkpoint. No adapter or PEFT library needed at inference time.
|
| 25 |
+
|
| 26 |
+
## Performance
|
| 27 |
+
|
| 28 |
+
- **99.0% top-1 accuracy** on 394 font classes (held-out test set)
|
| 29 |
+
- **99.8% family-level accuracy** (collapsing weight variants into parent families)
|
| 30 |
+
- Errors are overwhelmingly within-family weight confusions (e.g. Roboto-400 vs Roboto-500), not cross-family misidentifications
|
| 31 |
+
|
| 32 |
+
| Method | Trainable Params | Top-1 Acc |
|
| 33 |
+
|---|---|---|
|
| 34 |
+
| **LoRA r=8 (this model)** | **900K** | **99.0%** |
|
| 35 |
+
| ResNet-50 | 25.6M | 98.8% |
|
| 36 |
+
| LoRA r=16 | 1.2M | 98.9% |
|
| 37 |
+
| LoRA r=4 | 753K | 97.9% |
|
| 38 |
+
| Full Fine-Tuning | 87.2M | 95.9% |
|
| 39 |
+
|
| 40 |
+
## Training data
|
| 41 |
+
|
| 42 |
+
[dchen0/font_crops_v5](https://huggingface.co/datasets/dchen0/font_crops_v5) — ~225K synthetic images generated by rendering random text in each font variant. ~575 training images and 40 test images per class. Images include color augmentation, layout variation (left/center/right alignment, multi-line), and Gaussian noise.
|
| 43 |
+
|
| 44 |
+
### Font families (32)
|
| 45 |
+
|
| 46 |
+
BigShouldersText, BricolageGrotesque, CrimsonPro, DMSans, Geist, HedvigLettersSerif, InstrumentSans, InstrumentSerif, Inter, JetBrainsMono, LexendDeca, Lora, Merriweather, Montserrat, Newsreader, NunitoSans, Onest, OpenSans, Petrona, PlayfairDisplay, PlusJakartaSans, Poppins, PT Serif Caption, RethinkSans, Roboto, RobotoSerif, ShipporiMincho, Sora, SpaceGrotesk, Ultra, Urbanist, WorkSans
|
| 47 |
+
|
| 48 |
+
## Training details
|
| 49 |
+
|
| 50 |
+
| Hyperparameter | Value |
|
| 51 |
+
|---|---|
|
| 52 |
+
| Optimizer | AdamW |
|
| 53 |
+
| Learning rate | 1e-4 |
|
| 54 |
+
| Batch size | 64 |
|
| 55 |
+
| Epochs | 100 |
|
| 56 |
+
| LR scheduler | Linear decay |
|
| 57 |
+
| Precision | FP16 |
|
| 58 |
+
| LoRA rank | 8 |
|
| 59 |
+
| LoRA alpha | 16 |
|
| 60 |
+
| LoRA dropout | 0.1 |
|
| 61 |
+
| LoRA targets | query, value |
|
| 62 |
+
| GPU | NVIDIA RTX 3090 (24 GB) |
|
| 63 |
+
| Training time | ~33 hours |
|
| 64 |
+
|
| 65 |
+
## Preprocessing
|
| 66 |
+
|
| 67 |
+
Preprocessing is built into `handler.py` and must match at inference time:
|
| 68 |
+
|
| 69 |
+
1. Convert to RGB
|
| 70 |
+
2. Pad to square (black fill, centered)
|
| 71 |
+
3. Resize to 224x224
|
| 72 |
+
4. Normalize with ImageNet stats (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
|
| 73 |
+
|
| 74 |
+
## Usage
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
from transformers import Dinov2ForImageClassification, AutoImageProcessor
|
| 78 |
+
from handler import get_inference_transform
|
| 79 |
+
from PIL import Image
|
| 80 |
+
import torch
|
| 81 |
+
|
| 82 |
+
model = Dinov2ForImageClassification.from_pretrained("dchen0/font-classifier")
|
| 83 |
+
processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier")
|
| 84 |
+
model.eval()
|
| 85 |
+
|
| 86 |
+
transform = get_inference_transform(processor, processor.size["shortest_edge"])
|
| 87 |
+
image = Image.open("font_sample.png").convert("RGB")
|
| 88 |
+
pixel_values = transform(image).unsqueeze(0)
|
| 89 |
+
|
| 90 |
+
with torch.no_grad():
|
| 91 |
+
logits = model(pixel_values=pixel_values).logits
|
| 92 |
+
|
| 93 |
+
predicted_class = logits.argmax(-1).item()
|
| 94 |
+
print(model.config.id2label[predicted_class])
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
## Source
|
| 98 |
+
|
| 99 |
+
- Training code: [github.com/Create-Inc/font-model](https://github.com/Create-Inc/font-model)
|
| 100 |
+
- Results repo (checkpoints, logs): [dchen0/font-model-results](https://huggingface.co/dchen0/font-model-results)
|
| 101 |
+
- Dataset: [dchen0/font_crops_v5](https://huggingface.co/datasets/dchen0/font_crops_v5)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|