Font Classifier

A DINOv2 Vision Transformer fine-tuned with LoRA for font classification across 394 font variants from 32 Google Fonts families.

How it was made

Base model: facebook/dinov2-base-imagenet1k-1-layer (87.2M parameters, frozen).
Fine-tuning: LoRA (rank 8, alpha 16) applied to the query and value projections in each ViT attention block, plus a trainable classification head. ~900K trainable parameters (1% of total).
Promotion: This model was promoted from the lora_r8/result_model adapter in dchen0/font-model-results using promote_model.py. That script loads the base DINOv2 model, merges the LoRA adapter weights into it (merge_and_unload()), and uploads the result as a standalone checkpoint. No adapter or PEFT library needed at inference time.

Performance

99.0% top-1 accuracy on 394 font classes (held-out test set)
99.8% family-level accuracy (collapsing weight variants into parent families)
Errors are overwhelmingly within-family weight confusions (e.g. Roboto-400 vs Roboto-500), not cross-family misidentifications

Method	Trainable Params	Top-1 Acc
LoRA r=8 (this model)	900K	99.0%
ResNet-50	25.6M	98.8%
LoRA r=16	1.2M	98.9%
LoRA r=4	753K	97.9%
Full Fine-Tuning	87.2M	95.9%

Training data

dchen0/font_crops_v5 — ~225K synthetic images generated by rendering random text in each font variant. ~575 training images and 40 test images per class. Images include color augmentation, layout variation (left/center/right alignment, multi-line), and Gaussian noise.

Font families (32)

BigShouldersText, BricolageGrotesque, CrimsonPro, DMSans, Geist, HedvigLettersSerif, InstrumentSans, InstrumentSerif, Inter, JetBrainsMono, LexendDeca, Lora, Merriweather, Montserrat, Newsreader, NunitoSans, Onest, OpenSans, Petrona, PlayfairDisplay, PlusJakartaSans, Poppins, PT Serif Caption, RethinkSans, Roboto, RobotoSerif, ShipporiMincho, Sora, SpaceGrotesk, Ultra, Urbanist, WorkSans

Training details

Hyperparameter	Value
Optimizer	AdamW
Learning rate	1e-4
Batch size	64
Epochs	100
LR scheduler	Linear decay
Precision	FP16
LoRA rank	8
LoRA alpha	16
LoRA dropout	0.1
LoRA targets	query, value
GPU	NVIDIA RTX 3090 (24 GB)
Training time	~33 hours

Preprocessing

Preprocessing is built into handler.py and must match at inference time:

Convert to RGB
Pad to square (black fill, centered)
Resize to 224x224
Normalize with ImageNet stats (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

Usage

from transformers import Dinov2ForImageClassification, AutoImageProcessor
from handler import get_inference_transform
from PIL import Image
import torch

model = Dinov2ForImageClassification.from_pretrained("dchen0/font-classifier")
processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier")
model.eval()

transform = get_inference_transform(processor, processor.size["shortest_edge"])
image = Image.open("font_sample.png").convert("RGB")
pixel_values = transform(image).unsqueeze(0)

with torch.no_grad():
    logits = model(pixel_values=pixel_values).logits

predicted_class = logits.argmax(-1).item()
print(model.config.id2label[predicted_class])

Source

Training code: github.com/Create-Inc/font-model
Results repo (checkpoints, logs): dchen0/font-model-results
Dataset: dchen0/font_crops_v5

Downloads last month: 30

Safetensors

Model size

87.2M params

Tensor type

F32

Model tree for dchen0/font-classifier

Base model

facebook/dinov2-base-imagenet1k-1-layer

Adapter

(1)

this model

Dataset used to train dchen0/font-classifier

Paper for dchen0/font-classifier

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 63