dchen0
/

font-classifier

@@ -1,115 +1,101 @@
 ---
 license: apache-2.0
 pipeline_tag: image-classification
-library_name: transformers           # ← change “peft” → “transformers”
 tags:
   - dinov2
   - image-classification
   - fonts
 ---
-# dchen0/font-classifier
-Merged DINOv2‑base checkpoint with LoRA weights for font classification.
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-This model is a fine-tuned version of [facebook/dinov2-base-imagenet1k-1-layer](https://huggingface.co/facebook/dinov2-base-imagenet1k-1-layer) on the imagefolder dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.2637
-- Model Preparation Time: 0.0016
-- Accuracy: 0.9163
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 32
-- eval_batch_size: 32
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 1
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Model Preparation Time | Accuracy |
-|:-------------:|:------:|:----:|:---------------:|:----------------------:|:--------:|
-| 0.7099        | 0.0182 | 50   | 0.6595          | 0.0016                 | 0.7594   |
-| 0.7084        | 0.0363 | 100  | 0.6175          | 0.0016                 | 0.7806   |
-| 0.7638        | 0.0545 | 150  | 0.7014          | 0.0016                 | 0.7337   |
-| 0.6451        | 0.0727 | 200  | 0.6177          | 0.0016                 | 0.7757   |
-| 0.6852        | 0.0908 | 250  | 0.5691          | 0.0016                 | 0.7971   |
-| 0.5753        | 0.1090 | 300  | 0.5666          | 0.0016                 | 0.8048   |
-| 0.5925        | 0.1272 | 350  | 0.5235          | 0.0016                 | 0.8204   |
-| 0.6969        | 0.1453 | 400  | 0.5725          | 0.0016                 | 0.7922   |
-| 0.6096        | 0.1635 | 450  | 0.5103          | 0.0016                 | 0.8173   |
-| 0.5994        | 0.1817 | 500  | 0.5075          | 0.0016                 | 0.8183   |
-| 0.5272        | 0.1999 | 550  | 0.5116          | 0.0016                 | 0.8229   |
-| 0.5193        | 0.2180 | 600  | 0.4952          | 0.0016                 | 0.8244   |
-| 0.5689        | 0.2362 | 650  | 0.4662          | 0.0016                 | 0.8388   |
-| 0.5126        | 0.2544 | 700  | 0.4651          | 0.0016                 | 0.8327   |
-| 0.5301        | 0.2725 | 750  | 0.5080          | 0.0016                 | 0.8158   |
-| 0.5424        | 0.2907 | 800  | 0.4573          | 0.0016                 | 0.8357   |
-| 0.4357        | 0.3089 | 850  | 0.4412          | 0.0016                 | 0.8486   |
-| 0.5522        | 0.3270 | 900  | 0.4755          | 0.0016                 | 0.8256   |
-| 0.5639        | 0.3452 | 950  | 0.4463          | 0.0016                 | 0.8339   |
-| 0.4522        | 0.3634 | 1000 | 0.4347          | 0.0016                 | 0.8458   |
-| 0.5548        | 0.3815 | 1050 | 0.4112          | 0.0016                 | 0.8560   |
-| 0.4815        | 0.3997 | 1100 | 0.4300          | 0.0016                 | 0.8514   |
-| 0.5028        | 0.4179 | 1150 | 0.3840          | 0.0016                 | 0.8713   |
-| 0.4417        | 0.4360 | 1200 | 0.4364          | 0.0016                 | 0.8462   |
-| 0.4465        | 0.4542 | 1250 | 0.3731          | 0.0016                 | 0.8740   |
-| 0.3935        | 0.4724 | 1300 | 0.3672          | 0.0016                 | 0.8753   |
-| 0.5306        | 0.4906 | 1350 | 0.4480          | 0.0016                 | 0.8388   |
-| 0.3991        | 0.5087 | 1400 | 0.3718          | 0.0016                 | 0.8698   |
-| 0.483         | 0.5269 | 1450 | 0.3916          | 0.0016                 | 0.8652   |
-| 0.4323        | 0.5451 | 1500 | 0.3948          | 0.0016                 | 0.8648   |
-| 0.3664        | 0.5632 | 1550 | 0.3400          | 0.0016                 | 0.8796   |
-| 0.4941        | 0.5814 | 1600 | 0.3531          | 0.0016                 | 0.8765   |
-| 0.4185        | 0.5996 | 1650 | 0.3481          | 0.0016                 | 0.8820   |
-| 0.4506        | 0.6177 | 1700 | 0.3332          | 0.0016                 | 0.8866   |
-| 0.4015        | 0.6359 | 1750 | 0.3468          | 0.0016                 | 0.8768   |
-| 0.3919        | 0.6541 | 1800 | 0.3421          | 0.0016                 | 0.8897   |
-| 0.4281        | 0.6722 | 1850 | 0.3141          | 0.0016                 | 0.8937   |
-| 0.3659        | 0.6904 | 1900 | 0.3424          | 0.0016                 | 0.8823   |
-| 0.345         | 0.7086 | 1950 | 0.3172          | 0.0016                 | 0.8912   |
-| 0.3157        | 0.7267 | 2000 | 0.3226          | 0.0016                 | 0.8903   |
-| 0.3456        | 0.7449 | 2050 | 0.3178          | 0.0016                 | 0.8909   |
-| 0.3643        | 0.7631 | 2100 | 0.2988          | 0.0016                 | 0.8983   |
-| 0.4043        | 0.7812 | 2150 | 0.3036          | 0.0016                 | 0.8992   |
-| 0.3486        | 0.7994 | 2200 | 0.2974          | 0.0016                 | 0.9053   |
-| 0.3735        | 0.8176 | 2250 | 0.3026          | 0.0016                 | 0.8964   |
-| 0.4032        | 0.8358 | 2300 | 0.2990          | 0.0016                 | 0.9019   |
-| 0.3825        | 0.8539 | 2350 | 0.2938          | 0.0016                 | 0.9062   |
-| 0.345         | 0.8721 | 2400 | 0.2871          | 0.0016                 | 0.9059   |
-| 0.3528        | 0.8903 | 2450 | 0.2777          | 0.0016                 | 0.9093   |
-| 0.3207        | 0.9084 | 2500 | 0.2764          | 0.0016                 | 0.9111   |
-| 0.2664        | 0.9266 | 2550 | 0.2741          | 0.0016                 | 0.9099   |
-| 0.3496        | 0.9448 | 2600 | 0.2720          | 0.0016                 | 0.9151   |
-| 0.3274        | 0.9629 | 2650 | 0.2724          | 0.0016                 | 0.9136   |
-| 0.3014        | 0.9811 | 2700 | 0.2659          | 0.0016                 | 0.9136   |
-| 0.3235        | 0.9993 | 2750 | 0.2637          | 0.0016                 | 0.9163   |
-### Framework versions
-- PEFT 0.15.2
-- Transformers 4.52.4
-- Pytorch 2.7.1
-- Datasets 3.6.0
-- Tokenizers 0.21.1

 ---
 license: apache-2.0
 pipeline_tag: image-classification
+library_name: transformers
 tags:
   - dinov2
   - image-classification
   - fonts
+  - lora
+  - vision-transformer
+datasets:
+  - dchen0/font_crops_v5
+base_model: facebook/dinov2-base-imagenet1k-1-layer
 ---
+# Font Classifier
+A DINOv2 Vision Transformer fine-tuned with LoRA for font classification across 394 font variants from 32 Google Fonts families.
+## How it was made
+1. **Base model**: [facebook/dinov2-base-imagenet1k-1-layer](https://huggingface.co/facebook/dinov2-base-imagenet1k-1-layer) (87.2M parameters, frozen).
+2. **Fine-tuning**: [LoRA](https://arxiv.org/abs/2106.09685) (rank 8, alpha 16) applied to the query and value projections in each ViT attention block, plus a trainable classification head. ~900K trainable parameters (1% of total).
+3. **Merge**: After training, the LoRA adapter weights were merged into the base model (`merge_and_unload()`), producing this standalone checkpoint. No adapter or PEFT library needed at inference time.
+## Performance
+- **99.0% top-1 accuracy** on 394 font classes (held-out test set)
+- **99.8% family-level accuracy** (collapsing weight variants into parent families)
+- Errors are overwhelmingly within-family weight confusions (e.g. Roboto-400 vs Roboto-500), not cross-family misidentifications
+| Method | Trainable Params | Top-1 Acc |
+|---|---|---|
+| **LoRA r=8 (this model)** | **900K** | **99.0%** |
+| ResNet-50 | 25.6M | 98.8% |
+| LoRA r=16 | 1.2M | 98.9% |
+| LoRA r=4 | 753K | 97.9% |
+| Full Fine-Tuning | 87.2M | 95.9% |
+## Training data
+[dchen0/font_crops_v5](https://huggingface.co/datasets/dchen0/font_crops_v5) — ~225K synthetic images generated by rendering random text in each font variant. ~575 training images and 40 test images per class. Images include color augmentation, layout variation (left/center/right alignment, multi-line), and Gaussian noise.
+### Font families (32)
+BigShouldersText, BricolageGrotesque, CrimsonPro, DMSans, Geist, HedvigLettersSerif, InstrumentSans, InstrumentSerif, Inter, JetBrainsMono, LexendDeca, Lora, Merriweather, Montserrat, Newsreader, NunitoSans, Onest, OpenSans, Petrona, PlayfairDisplay, PlusJakartaSans, Poppins, PT Serif Caption, RethinkSans, Roboto, RobotoSerif, ShipporiMincho, Sora, SpaceGrotesk, Ultra, Urbanist, WorkSans
+## Training details
+| Hyperparameter | Value |
+|---|---|
+| Optimizer | AdamW |
+| Learning rate | 1e-4 |
+| Batch size | 64 |
+| Epochs | 100 |
+| LR scheduler | Linear decay |
+| Precision | FP16 |
+| LoRA rank | 8 |
+| LoRA alpha | 16 |
+| LoRA dropout | 0.1 |
+| LoRA targets | query, value |
+| GPU | NVIDIA RTX 3090 (24 GB) |
+| Training time | ~33 hours |
+## Preprocessing
+Preprocessing is built into `handler.py` and must match at inference time:
+1. Convert to RGB
+2. Pad to square (black fill, centered)
+3. Resize to 224x224
+4. Normalize with ImageNet stats (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+## Usage
+```python
+from transformers import Dinov2ForImageClassification, AutoImageProcessor
+from handler import get_inference_transform
+from PIL import Image
+import torch
+model = Dinov2ForImageClassification.from_pretrained("dchen0/font-classifier")
+processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier")
+model.eval()
+transform = get_inference_transform(processor, processor.size["shortest_edge"])
+image = Image.open("font_sample.png").convert("RGB")
+pixel_values = transform(image).unsqueeze(0)
+with torch.no_grad():
+    logits = model(pixel_values=pixel_values).logits
+predicted_class = logits.argmax(-1).item()
+print(model.config.id2label[predicted_class])
+```
+## Source
+- Training code: [github.com/Create-Inc/font-model](https://github.com/Create-Inc/font-model)
+- Results repo (checkpoints, logs): [dchen0/font-model-results](https://huggingface.co/dchen0/font-model-results)
+- Dataset: [dchen0/font_crops_v5](https://huggingface.co/datasets/dchen0/font_crops_v5)