dchen0
/

font_classifier_v4

Image Classification

Model card Files Files and versions

font_classifier_v4 / README.md

dchen0's picture

Update README.md

d2a3aba verified about 1 month ago

|

history blame contribute delete

3.42 kB

	---
	license: apache-2.0
	pipeline_tag: image-classification
	library_name: transformers
	datasets:
	- dchen0/font_crops_v4
	---

	# Font Classifier DINOv2 (Server-Side Preprocessing)

	A fine-tuned DINOv2 model for font classification with built-in preprocessing.

	🎯 Key Feature: No client-side preprocessing required!

	## Performance
	- Accuracy: ~86% on test set
	- Preprocessing: Automatic server-side pad-to-square + normalization

	## Usage

	### Simple API Usage (Recommended)

	Clients can send raw images directly to inference endpoints:

	```python
	import requests
	import base64

	# Load your image
	with open("test_image.png", "rb") as f:
	image_data = base64.b64encode(f.read()).decode()

	# Send to inference endpoint
	response = requests.post(
	"https://your-endpoint.com",
	headers={"Authorization": "Bearer YOUR_TOKEN"},
	json={"inputs": image_data}
	)

	results = response.json()
	print(f"Predicted font: {results[0]['label']} ({results[0]['score']:.2%})")
	```

	### Standard HuggingFace Usage

	```python
	from transformers import pipeline

	# The model automatically handles preprocessing
	classifier = pipeline("image-classification", model="dchen0/font-classifier-v4")
	results = classifier("your_image.png")
	print(f"Predicted font: {results[0]['label']}")
	```

	### Direct Model Usage

	```python
	from PIL import Image
	import torch
	from transformers import AutoImageProcessor
	from font_classifier_with_preprocessing import FontClassifierWithPreprocessing

	# Load model and processor
	model = FontClassifierWithPreprocessing.from_pretrained("dchen0/font-classifier-v4")
	processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier-v4")

	# Process image (model handles pad_to_square automatically)
	image = Image.open("test.png")
	inputs = processor(images=image, return_tensors="pt")
	outputs = model(**inputs)
	```

	## Model Architecture

	- Base Model: facebook/dinov2-base-imagenet1k-1-layer
	- Fine-tuning: LoRA on Google Fonts dataset
	- Labels: 394 font families
	- Preprocessing: Built-in pad-to-square + ImageNet normalization

	## Server-Side Preprocessing

	This model automatically applies the following preprocessing in its forward pass:

	1. Pad to square preserving aspect ratio
	2. Resize to 224×224
	3. Normalize with ImageNet statistics

	No client-side preprocessing required - just send raw images!

	## Deployment

	### HuggingFace Inference Endpoints

	1. Deploy this model to an Inference Endpoint
	2. Send raw images directly - preprocessing happens automatically
	3. Achieve ~86% accuracy out of the box

	### Custom Deployment

	The model includes preprocessing in the forward pass, so any deployment (TorchServe, TensorFlow Serving, etc.) will automatically apply correct preprocessing.

	## Files

	- `font_classifier_with_preprocessing.py`: Custom model class with built-in preprocessing
	- Standard HuggingFace model files

	## Technical Details

	The model inherits from `Dinov2ForImageClassification` but overrides the forward pass to include:

	```python
	def forward(self, pixel_values=None, labels=None, **kwargs):
	# Automatic preprocessing happens here
	processed_pixel_values = self.preprocess_images(pixel_values)
	return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs)
	```

	This ensures that whether clients send raw images or pre-processed tensors, the model receives correctly formatted input.