File size: 3,416 Bytes
7f683aa 518728c 7f683aa 9d161f4 d2a3aba 832193f ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 518728c ecb5b6d 832193f ecb5b6d 832193f ecb5b6d 832193f 518728c 832193f ecb5b6d 518728c 832193f ecb5b6d 832193f ecb5b6d 832193f ecb5b6d 518728c ecb5b6d 9d161f4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
license: apache-2.0
pipeline_tag: image-classification
library_name: transformers
datasets:
- dchen0/font_crops_v4
---
# Font Classifier DINOv2 (Server-Side Preprocessing)
A fine-tuned DINOv2 model for font classification with **built-in preprocessing**.
🎯 **Key Feature: No client-side preprocessing required!**
## Performance
- **Accuracy**: ~86% on test set
- **Preprocessing**: Automatic server-side pad-to-square + normalization
## Usage
### Simple API Usage (Recommended)
Clients can send **raw images directly** to inference endpoints:
```python
import requests
import base64
# Load your image
with open("test_image.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
# Send to inference endpoint
response = requests.post(
"https://your-endpoint.com",
headers={"Authorization": "Bearer YOUR_TOKEN"},
json={"inputs": image_data}
)
results = response.json()
print(f"Predicted font: {results[0]['label']} ({results[0]['score']:.2%})")
```
### Standard HuggingFace Usage
```python
from transformers import pipeline
# The model automatically handles preprocessing
classifier = pipeline("image-classification", model="dchen0/font-classifier-v4")
results = classifier("your_image.png")
print(f"Predicted font: {results[0]['label']}")
```
### Direct Model Usage
```python
from PIL import Image
import torch
from transformers import AutoImageProcessor
from font_classifier_with_preprocessing import FontClassifierWithPreprocessing
# Load model and processor
model = FontClassifierWithPreprocessing.from_pretrained("dchen0/font-classifier-v4")
processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier-v4")
# Process image (model handles pad_to_square automatically)
image = Image.open("test.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
```
## Model Architecture
- **Base Model**: facebook/dinov2-base-imagenet1k-1-layer
- **Fine-tuning**: LoRA on Google Fonts dataset
- **Labels**: 394 font families
- **Preprocessing**: Built-in pad-to-square + ImageNet normalization
## Server-Side Preprocessing
This model automatically applies the following preprocessing in its forward pass:
1. **Pad to square** preserving aspect ratio
2. **Resize** to 224×224
3. **Normalize** with ImageNet statistics
**No client-side preprocessing required** - just send raw images!
## Deployment
### HuggingFace Inference Endpoints
1. Deploy this model to an Inference Endpoint
2. Send raw images directly - preprocessing happens automatically
3. Achieve ~86% accuracy out of the box
### Custom Deployment
The model includes preprocessing in the forward pass, so any deployment (TorchServe, TensorFlow Serving, etc.) will automatically apply correct preprocessing.
## Files
- `font_classifier_with_preprocessing.py`: Custom model class with built-in preprocessing
- Standard HuggingFace model files
## Technical Details
The model inherits from `Dinov2ForImageClassification` but overrides the forward pass to include:
```python
def forward(self, pixel_values=None, labels=None, **kwargs):
# Automatic preprocessing happens here
processed_pixel_values = self.preprocess_images(pixel_values)
return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs)
```
This ensures that whether clients send raw images or pre-processed tensors, the model receives correctly formatted input. |