|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: image-classification |
|
|
library_name: transformers |
|
|
datasets: |
|
|
- dchen0/font_crops_v4 |
|
|
--- |
|
|
|
|
|
# Font Classifier DINOv2 (Server-Side Preprocessing) |
|
|
|
|
|
A fine-tuned DINOv2 model for font classification with **built-in preprocessing**. |
|
|
|
|
|
🎯 **Key Feature: No client-side preprocessing required!** |
|
|
|
|
|
## Performance |
|
|
- **Accuracy**: ~86% on test set |
|
|
- **Preprocessing**: Automatic server-side pad-to-square + normalization |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Simple API Usage (Recommended) |
|
|
|
|
|
Clients can send **raw images directly** to inference endpoints: |
|
|
|
|
|
```python |
|
|
import requests |
|
|
import base64 |
|
|
|
|
|
# Load your image |
|
|
with open("test_image.png", "rb") as f: |
|
|
image_data = base64.b64encode(f.read()).decode() |
|
|
|
|
|
# Send to inference endpoint |
|
|
response = requests.post( |
|
|
"https://your-endpoint.com", |
|
|
headers={"Authorization": "Bearer YOUR_TOKEN"}, |
|
|
json={"inputs": image_data} |
|
|
) |
|
|
|
|
|
results = response.json() |
|
|
print(f"Predicted font: {results[0]['label']} ({results[0]['score']:.2%})") |
|
|
``` |
|
|
|
|
|
### Standard HuggingFace Usage |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# The model automatically handles preprocessing |
|
|
classifier = pipeline("image-classification", model="dchen0/font-classifier-v4") |
|
|
results = classifier("your_image.png") |
|
|
print(f"Predicted font: {results[0]['label']}") |
|
|
``` |
|
|
|
|
|
### Direct Model Usage |
|
|
|
|
|
```python |
|
|
from PIL import Image |
|
|
import torch |
|
|
from transformers import AutoImageProcessor |
|
|
from font_classifier_with_preprocessing import FontClassifierWithPreprocessing |
|
|
|
|
|
# Load model and processor |
|
|
model = FontClassifierWithPreprocessing.from_pretrained("dchen0/font-classifier-v4") |
|
|
processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier-v4") |
|
|
|
|
|
# Process image (model handles pad_to_square automatically) |
|
|
image = Image.open("test.png") |
|
|
inputs = processor(images=image, return_tensors="pt") |
|
|
outputs = model(**inputs) |
|
|
``` |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
- **Base Model**: facebook/dinov2-base-imagenet1k-1-layer |
|
|
- **Fine-tuning**: LoRA on Google Fonts dataset |
|
|
- **Labels**: 394 font families |
|
|
- **Preprocessing**: Built-in pad-to-square + ImageNet normalization |
|
|
|
|
|
## Server-Side Preprocessing |
|
|
|
|
|
This model automatically applies the following preprocessing in its forward pass: |
|
|
|
|
|
1. **Pad to square** preserving aspect ratio |
|
|
2. **Resize** to 224×224 |
|
|
3. **Normalize** with ImageNet statistics |
|
|
|
|
|
**No client-side preprocessing required** - just send raw images! |
|
|
|
|
|
## Deployment |
|
|
|
|
|
### HuggingFace Inference Endpoints |
|
|
|
|
|
1. Deploy this model to an Inference Endpoint |
|
|
2. Send raw images directly - preprocessing happens automatically |
|
|
3. Achieve ~86% accuracy out of the box |
|
|
|
|
|
### Custom Deployment |
|
|
|
|
|
The model includes preprocessing in the forward pass, so any deployment (TorchServe, TensorFlow Serving, etc.) will automatically apply correct preprocessing. |
|
|
|
|
|
## Files |
|
|
|
|
|
- `font_classifier_with_preprocessing.py`: Custom model class with built-in preprocessing |
|
|
- Standard HuggingFace model files |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
The model inherits from `Dinov2ForImageClassification` but overrides the forward pass to include: |
|
|
|
|
|
```python |
|
|
def forward(self, pixel_values=None, labels=None, **kwargs): |
|
|
# Automatic preprocessing happens here |
|
|
processed_pixel_values = self.preprocess_images(pixel_values) |
|
|
return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs) |
|
|
``` |
|
|
|
|
|
This ensures that whether clients send raw images or pre-processed tensors, the model receives correctly formatted input. |