File size: 3,416 Bytes
7f683aa
518728c
7f683aa
9d161f4
d2a3aba
 
832193f
 
ecb5b6d
518728c
ecb5b6d
518728c
ecb5b6d
518728c
 
ecb5b6d
 
518728c
ecb5b6d
518728c
ecb5b6d
518728c
ecb5b6d
518728c
 
ecb5b6d
 
 
 
 
 
518728c
ecb5b6d
 
 
 
 
 
518728c
ecb5b6d
 
518728c
 
ecb5b6d
518728c
 
 
 
ecb5b6d
 
 
 
 
518728c
ecb5b6d
518728c
ecb5b6d
 
 
 
 
 
 
 
 
 
 
 
 
 
518728c
 
ecb5b6d
518728c
 
ecb5b6d
518728c
ecb5b6d
518728c
ecb5b6d
518728c
ecb5b6d
518728c
ecb5b6d
 
 
518728c
ecb5b6d
518728c
ecb5b6d
518728c
ecb5b6d
518728c
ecb5b6d
 
 
832193f
ecb5b6d
832193f
ecb5b6d
832193f
518728c
832193f
ecb5b6d
518728c
832193f
ecb5b6d
832193f
ecb5b6d
832193f
ecb5b6d
 
 
 
 
518728c
ecb5b6d
9d161f4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
license: apache-2.0
pipeline_tag: image-classification
library_name: transformers
datasets:
- dchen0/font_crops_v4
---

# Font Classifier DINOv2 (Server-Side Preprocessing)

A fine-tuned DINOv2 model for font classification with **built-in preprocessing**.

🎯 **Key Feature: No client-side preprocessing required!**

## Performance
- **Accuracy**: ~86% on test set
- **Preprocessing**: Automatic server-side pad-to-square + normalization

## Usage

### Simple API Usage (Recommended)

Clients can send **raw images directly** to inference endpoints:

```python
import requests
import base64

# Load your image
with open("test_image.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

# Send to inference endpoint
response = requests.post(
    "https://your-endpoint.com",
    headers={"Authorization": "Bearer YOUR_TOKEN"},
    json={"inputs": image_data}
)

results = response.json()
print(f"Predicted font: {results[0]['label']} ({results[0]['score']:.2%})")
```

### Standard HuggingFace Usage

```python
from transformers import pipeline

# The model automatically handles preprocessing
classifier = pipeline("image-classification", model="dchen0/font-classifier-v4")
results = classifier("your_image.png")
print(f"Predicted font: {results[0]['label']}")
```

### Direct Model Usage

```python
from PIL import Image
import torch
from transformers import AutoImageProcessor
from font_classifier_with_preprocessing import FontClassifierWithPreprocessing

# Load model and processor
model = FontClassifierWithPreprocessing.from_pretrained("dchen0/font-classifier-v4")
processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier-v4")

# Process image (model handles pad_to_square automatically)
image = Image.open("test.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
```

## Model Architecture

- **Base Model**: facebook/dinov2-base-imagenet1k-1-layer
- **Fine-tuning**: LoRA on Google Fonts dataset  
- **Labels**: 394 font families
- **Preprocessing**: Built-in pad-to-square + ImageNet normalization

## Server-Side Preprocessing

This model automatically applies the following preprocessing in its forward pass:

1. **Pad to square** preserving aspect ratio
2. **Resize** to 224×224
3. **Normalize** with ImageNet statistics

**No client-side preprocessing required** - just send raw images!

## Deployment

### HuggingFace Inference Endpoints

1. Deploy this model to an Inference Endpoint
2. Send raw images directly - preprocessing happens automatically
3. Achieve ~86% accuracy out of the box

### Custom Deployment

The model includes preprocessing in the forward pass, so any deployment (TorchServe, TensorFlow Serving, etc.) will automatically apply correct preprocessing.

## Files

- `font_classifier_with_preprocessing.py`: Custom model class with built-in preprocessing
- Standard HuggingFace model files

## Technical Details

The model inherits from `Dinov2ForImageClassification` but overrides the forward pass to include:

```python
def forward(self, pixel_values=None, labels=None, **kwargs):
    # Automatic preprocessing happens here
    processed_pixel_values = self.preprocess_images(pixel_values)
    return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs)
```

This ensures that whether clients send raw images or pre-processed tensors, the model receives correctly formatted input.