dchen0 commited on
Commit
ecb5b6d
·
verified ·
1 Parent(s): 518728c

Add model with built-in server-side preprocessing

Browse files
README.md CHANGED
@@ -4,112 +4,113 @@ license: apache-2.0
4
  pipeline_tag: image-classification
5
  ---
6
 
7
- # Font Classifier DINOv2
8
 
9
- A fine-tuned DINOv2 model for font classification trained on Google Fonts.
10
 
11
- ⚠️ **Critical: This model requires custom preprocessing for optimal accuracy.**
12
 
13
  ## Performance
14
- - **With correct preprocessing**: ~86% accuracy
15
- - **Without preprocessing**: ~30% accuracy
16
 
17
- ## Required Preprocessing
18
 
19
- Images must be **padded to square** (preserving aspect ratio) before being resized to 224×224.
20
 
21
- ### Option 1: Use our client wrapper (Recommended)
22
 
23
  ```python
24
- from font_classifier_client import FontClassifierClient
 
 
 
 
 
25
 
26
- # For local usage
27
- client = FontClassifierClient.from_local_model("dchen0/font-classifier-v4")
28
- results = client.predict("your_image.png")
 
 
 
29
 
30
- # For Inference Endpoints (automatically handles preprocessing)
31
- client = FontClassifierClient.from_inference_endpoint("https://your-endpoint-url")
32
- results = client.predict("your_image.png")
33
- print(f"Predicted font: {results[0][0]} ({results[0][1]:.2%} confidence)")
34
  ```
35
 
36
- ### Option 2: Manual preprocessing
37
 
38
  ```python
39
- import torch
40
- import torchvision.transforms as T
41
- from PIL import Image
42
  from transformers import pipeline
43
 
44
- def pad_to_square(image):
45
- w, h = image.size
46
- max_size = max(w, h)
47
- pad_w = (max_size - w) // 2
48
- pad_h = (max_size - h) // 2
49
- padding = (pad_w, pad_h, max_size - w - pad_w, max_size - h - pad_h)
50
- return T.Pad(padding, fill=0)(image)
51
 
52
- # Preprocess image
53
- image = Image.open("your_image.png").convert('RGB')
54
- image = pad_to_square(image)
55
 
56
- # Use with pipeline
57
- classifier = pipeline("image-classification", model="dchen0/font-classifier-v4")
58
- results = classifier(image)
 
 
 
 
 
 
 
 
 
 
 
59
  ```
60
 
61
- ## Model Details
62
 
63
  - **Base Model**: facebook/dinov2-base-imagenet1k-1-layer
64
- - **Training**: LoRA fine-tuning on Google Fonts dataset
65
  - **Labels**: 394 font families
66
- - **Architecture**: Vision Transformer (ViT-B/14)
67
 
68
- ## Training Details
69
 
70
- The model was trained with images that were:
71
- 1. **Padded to square** preserving aspect ratio
72
- 2. Resized to 224×224
73
- 3. Normalized with ImageNet statistics
74
- 4. Various data augmentations applied
75
 
76
- ## Usage with Inference Endpoints
 
 
77
 
78
- When using HuggingFace Inference Endpoints:
79
 
80
- 1. **Deploy the model** to an Inference Endpoint
81
- 2. **Use the client wrapper** which automatically handles preprocessing:
82
 
83
- ```python
84
- import requests
85
- from font_classifier_client import FontClassifierClient
86
 
87
- # The client handles all preprocessing automatically
88
- client = FontClassifierClient.from_inference_endpoint(
89
- api_url="https://your-endpoint.com",
90
- api_token="your-token" # if required
91
- )
92
 
93
- results = client.predict("test_image.png")
94
- print(f"Top prediction: {results[0][0]} ({results[0][1]:.2%})")
95
- ```
96
 
97
- The client wrapper ensures that images are properly padded to square before being sent to the endpoint.
98
 
99
  ## Files
100
 
101
- - `font_classifier_client.py`: Client wrapper with preprocessing
102
  - Standard HuggingFace model files
103
 
104
- ## Citation
105
 
106
- If you use this model, please cite:
107
 
 
 
 
 
 
108
  ```
109
- @misc{font-classifier-dinov2,
110
- title={Font Classifier DINOv2},
111
- author={Your Name},
112
- year={2024},
113
- url={https://huggingface.co/dchen0/font-classifier-v4}
114
- }
115
- ```
 
4
  pipeline_tag: image-classification
5
  ---
6
 
7
+ # Font Classifier DINOv2 (Server-Side Preprocessing)
8
 
9
+ A fine-tuned DINOv2 model for font classification with **built-in preprocessing**.
10
 
11
+ 🎯 **Key Feature: No client-side preprocessing required!**
12
 
13
  ## Performance
14
+ - **Accuracy**: ~86% on test set
15
+ - **Preprocessing**: Automatic server-side pad-to-square + normalization
16
 
17
+ ## Usage
18
 
19
+ ### Simple API Usage (Recommended)
20
 
21
+ Clients can send **raw images directly** to inference endpoints:
22
 
23
  ```python
24
+ import requests
25
+ import base64
26
+
27
+ # Load your image
28
+ with open("test_image.png", "rb") as f:
29
+ image_data = base64.b64encode(f.read()).decode()
30
 
31
+ # Send to inference endpoint
32
+ response = requests.post(
33
+ "https://your-endpoint.com",
34
+ headers={"Authorization": "Bearer YOUR_TOKEN"},
35
+ json={"inputs": image_data}
36
+ )
37
 
38
+ results = response.json()
39
+ print(f"Predicted font: {results[0]['label']} ({results[0]['score']:.2%})")
 
 
40
  ```
41
 
42
+ ### Standard HuggingFace Usage
43
 
44
  ```python
 
 
 
45
  from transformers import pipeline
46
 
47
+ # The model automatically handles preprocessing
48
+ classifier = pipeline("image-classification", model="dchen0/font-classifier-v4")
49
+ results = classifier("your_image.png")
50
+ print(f"Predicted font: {results[0]['label']}")
51
+ ```
 
 
52
 
53
+ ### Direct Model Usage
 
 
54
 
55
+ ```python
56
+ from PIL import Image
57
+ import torch
58
+ from transformers import AutoImageProcessor
59
+ from font_classifier_with_preprocessing import FontClassifierWithPreprocessing
60
+
61
+ # Load model and processor
62
+ model = FontClassifierWithPreprocessing.from_pretrained("dchen0/font-classifier-v4")
63
+ processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier-v4")
64
+
65
+ # Process image (model handles pad_to_square automatically)
66
+ image = Image.open("test.png")
67
+ inputs = processor(images=image, return_tensors="pt")
68
+ outputs = model(**inputs)
69
  ```
70
 
71
+ ## Model Architecture
72
 
73
  - **Base Model**: facebook/dinov2-base-imagenet1k-1-layer
74
+ - **Fine-tuning**: LoRA on Google Fonts dataset
75
  - **Labels**: 394 font families
76
+ - **Preprocessing**: Built-in pad-to-square + ImageNet normalization
77
 
78
+ ## Server-Side Preprocessing
79
 
80
+ This model automatically applies the following preprocessing in its forward pass:
 
 
 
 
81
 
82
+ 1. **Pad to square** preserving aspect ratio
83
+ 2. **Resize** to 224×224
84
+ 3. **Normalize** with ImageNet statistics
85
 
86
+ **No client-side preprocessing required** - just send raw images!
87
 
88
+ ## Deployment
 
89
 
90
+ ### HuggingFace Inference Endpoints
 
 
91
 
92
+ 1. Deploy this model to an Inference Endpoint
93
+ 2. Send raw images directly - preprocessing happens automatically
94
+ 3. Achieve ~86% accuracy out of the box
 
 
95
 
96
+ ### Custom Deployment
 
 
97
 
98
+ The model includes preprocessing in the forward pass, so any deployment (TorchServe, TensorFlow Serving, etc.) will automatically apply correct preprocessing.
99
 
100
  ## Files
101
 
102
+ - `font_classifier_with_preprocessing.py`: Custom model class with built-in preprocessing
103
  - Standard HuggingFace model files
104
 
105
+ ## Technical Details
106
 
107
+ The model inherits from `Dinov2ForImageClassification` but overrides the forward pass to include:
108
 
109
+ ```python
110
+ def forward(self, pixel_values=None, labels=None, **kwargs):
111
+ # Automatic preprocessing happens here
112
+ processed_pixel_values = self.preprocess_images(pixel_values)
113
+ return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs)
114
  ```
115
+
116
+ This ensures that whether clients send raw images or pre-processed tensors, the model receives correctly formatted input.
 
 
 
 
 
config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "apply_layernorm": true,
3
  "architectures": [
4
- "Dinov2ForImageClassification"
5
  ],
6
  "attention_probs_dropout_prob": 0.0,
7
  "drop_path_rate": 0.0,
@@ -837,5 +837,8 @@
837
  "torch_dtype": "float32",
838
  "transformers_version": "4.52.4",
839
  "use_mask_token": true,
840
- "use_swiglu_ffn": false
841
- }
 
 
 
 
1
  {
2
  "apply_layernorm": true,
3
  "architectures": [
4
+ "FontClassifierWithPreprocessing"
5
  ],
6
  "attention_probs_dropout_prob": 0.0,
7
  "drop_path_rate": 0.0,
 
837
  "torch_dtype": "float32",
838
  "transformers_version": "4.52.4",
839
  "use_mask_token": true,
840
+ "use_swiglu_ffn": false,
841
+ "auto_map": {
842
+ "AutoModelForImageClassification": "font_classifier_with_preprocessing.FontClassifierWithPreprocessing"
843
+ }
844
+ }
font_classifier_with_preprocessing.py ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Custom DINOv2 model that includes pad_to_square preprocessing in the forward pass.
3
+ This allows inference endpoints to automatically apply correct preprocessing.
4
+ """
5
+ import torch
6
+ import torch.nn as nn
7
+ import torch.nn.functional as F
8
+ from transformers import Dinov2ForImageClassification
9
+
10
+
11
+ class FontClassifierWithPreprocessing(Dinov2ForImageClassification):
12
+ """
13
+ DINOv2 model that automatically applies pad_to_square preprocessing.
14
+
15
+ This model can be deployed to Inference Endpoints and will automatically
16
+ handle preprocessing in the forward pass, so clients can send raw images.
17
+ """
18
+
19
+ def __init__(self, config):
20
+ super().__init__(config)
21
+
22
+ # Store preprocessing parameters
23
+ self.register_buffer('image_mean', torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1))
24
+ self.register_buffer('image_std', torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1))
25
+ self.target_size = 224
26
+
27
+ def pad_to_square_tensor(self, images):
28
+ """
29
+ Pad batch of images to square preserving aspect ratio.
30
+
31
+ Args:
32
+ images: Tensor of shape (B, C, H, W)
33
+ Returns:
34
+ Tensor of shape (B, C, max_size, max_size)
35
+ """
36
+ B, C, H, W = images.shape
37
+ max_size = max(H, W)
38
+
39
+ if H == W == max_size:
40
+ return images # Already square
41
+
42
+ # Calculate padding
43
+ pad_h = max_size - H
44
+ pad_w = max_size - W
45
+ pad_top = pad_h // 2
46
+ pad_bottom = pad_h - pad_top
47
+ pad_left = pad_w // 2
48
+ pad_right = pad_w - pad_left
49
+
50
+ # Apply padding (left, right, top, bottom)
51
+ padded = F.pad(images, (pad_left, pad_right, pad_top, pad_bottom), value=0)
52
+
53
+ return padded
54
+
55
+ def preprocess_images(self, pixel_values):
56
+ """
57
+ Apply full preprocessing pipeline to raw or partially processed images.
58
+
59
+ Args:
60
+ pixel_values: Tensor of shape (B, C, H, W)
61
+ Returns:
62
+ Preprocessed tensor ready for DINOv2
63
+ """
64
+ # Ensure we have a batch dimension
65
+ if pixel_values.dim() == 3:
66
+ pixel_values = pixel_values.unsqueeze(0)
67
+
68
+ # Convert to float if needed
69
+ if pixel_values.dtype != torch.float32:
70
+ pixel_values = pixel_values.float()
71
+
72
+ # Normalize to [0, 1] if values are in [0, 255]
73
+ if pixel_values.max() > 1.0:
74
+ pixel_values = pixel_values / 255.0
75
+
76
+ # Apply pad_to_square
77
+ pixel_values = self.pad_to_square_tensor(pixel_values)
78
+
79
+ # Resize to target size
80
+ if pixel_values.shape[-1] != self.target_size or pixel_values.shape[-2] != self.target_size:
81
+ pixel_values = F.interpolate(
82
+ pixel_values,
83
+ size=(self.target_size, self.target_size),
84
+ mode='bilinear',
85
+ align_corners=False
86
+ )
87
+
88
+ # Apply ImageNet normalization
89
+ pixel_values = (pixel_values - self.image_mean) / self.image_std
90
+
91
+ return pixel_values
92
+
93
+ def forward(self, pixel_values=None, labels=None, **kwargs):
94
+ """
95
+ Forward pass with automatic preprocessing.
96
+
97
+ Args:
98
+ pixel_values: Raw or preprocessed images
99
+ labels: Optional labels for training
100
+ """
101
+ if pixel_values is None:
102
+ raise ValueError("pixel_values must be provided")
103
+
104
+ # Apply preprocessing automatically
105
+ processed_pixel_values = self.preprocess_images(pixel_values)
106
+
107
+ # Call parent forward with preprocessed images
108
+ return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs)
109
+
110
+ # Function to convert existing model
111
+ def convert_to_preprocessing_model(original_model_path, output_path):
112
+ """
113
+ Convert an existing DINOv2 model to include preprocessing.
114
+
115
+ Args:
116
+ original_model_path: Path to original model
117
+ output_path: Path to save converted model
118
+ """
119
+ print(f"Converting {original_model_path} to include preprocessing...")
120
+
121
+ # Load original model
122
+ original_model = Dinov2ForImageClassification.from_pretrained(original_model_path)
123
+
124
+ # Create new model with same config
125
+ preprocessing_model = FontClassifierWithPreprocessing(original_model.config)
126
+
127
+ # Copy all weights
128
+ preprocessing_model.load_state_dict(original_model.state_dict())
129
+
130
+ # Save the new model
131
+ preprocessing_model.save_pretrained(output_path, safe_serialization=True)
132
+
133
+ # Copy processor config (unchanged)
134
+ from transformers import AutoImageProcessor
135
+ processor = AutoImageProcessor.from_pretrained(original_model_path)
136
+ processor.save_pretrained(output_path)
137
+
138
+ print(f"✅ Converted model saved to {output_path}")
139
+
140
+ return preprocessing_model
141
+
142
+ if __name__ == "__main__":
143
+ # Example: Convert existing model
144
+ convert_to_preprocessing_model(
145
+ "dchen0/font-classifier-v4",
146
+ "./font-classifier-with-preprocessing"
147
+ )
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0cd7bb6aa8492746ab58c79cc0a667d0654be31e1d50270140d1027e3523e0cb
3
- size 348769976
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0eeabb74d1af47629e61d6d4dd48bbf3eb74121db29c8ba8b644b41b8c481a6d
3
+ size 348770168