Add model with built-in server-side preprocessing

Browse files

Files changed (4) hide show

README.md +69 -68
config.json +6 -3
font_classifier_with_preprocessing.py +147 -0
model.safetensors +2 -2

README.md CHANGED Viewed

@@ -4,112 +4,113 @@ license: apache-2.0
 pipeline_tag: image-classification
 ---
-# Font Classifier DINOv2
-A fine-tuned DINOv2 model for font classification trained on Google Fonts.
-⚠️ **Critical: This model requires custom preprocessing for optimal accuracy.**
 ## Performance
-- **With correct preprocessing**: ~86% accuracy
-- **Without preprocessing**: ~30% accuracy
-## Required Preprocessing
-Images must be **padded to square** (preserving aspect ratio) before being resized to 224×224.
-### Option 1: Use our client wrapper (Recommended)
 ```python
-from font_classifier_client import FontClassifierClient
-# For local usage
-client = FontClassifierClient.from_local_model("dchen0/font-classifier-v4")
-results = client.predict("your_image.png")
-# For Inference Endpoints (automatically handles preprocessing)
-client = FontClassifierClient.from_inference_endpoint("https://your-endpoint-url")
-results = client.predict("your_image.png")
-print(f"Predicted font: {results[0][0]} ({results[0][1]:.2%} confidence)")
 ```
-### Option 2: Manual preprocessing
 ```python
-import torch
-import torchvision.transforms as T
-from PIL import Image
 from transformers import pipeline
-def pad_to_square(image):
-    w, h = image.size
-    max_size = max(w, h)
-    pad_w = (max_size - w) // 2
-    pad_h = (max_size - h) // 2
-    padding = (pad_w, pad_h, max_size - w - pad_w, max_size - h - pad_h)
-    return T.Pad(padding, fill=0)(image)
-# Preprocess image
-image = Image.open("your_image.png").convert('RGB')
-image = pad_to_square(image)
-# Use with pipeline
-classifier = pipeline("image-classification", model="dchen0/font-classifier-v4")
-results = classifier(image)
 ```
-## Model Details
 - **Base Model**: facebook/dinov2-base-imagenet1k-1-layer
-- **Training**: LoRA fine-tuning on Google Fonts dataset
 - **Labels**: 394 font families
-- **Architecture**: Vision Transformer (ViT-B/14)
-## Training Details
-The model was trained with images that were:
-1. **Padded to square** preserving aspect ratio
-2. Resized to 224×224
-3. Normalized with ImageNet statistics
-4. Various data augmentations applied
-## Usage with Inference Endpoints
-When using HuggingFace Inference Endpoints:
-1. **Deploy the model** to an Inference Endpoint
-2. **Use the client wrapper** which automatically handles preprocessing:
-```python
-import requests
-from font_classifier_client import FontClassifierClient
-# The client handles all preprocessing automatically
-client = FontClassifierClient.from_inference_endpoint(
-    api_url="https://your-endpoint.com",
-    api_token="your-token"  # if required
-)
-results = client.predict("test_image.png")
-print(f"Top prediction: {results[0][0]} ({results[0][1]:.2%})")
-```
-The client wrapper ensures that images are properly padded to square before being sent to the endpoint.
 ## Files
-- `font_classifier_client.py`: Client wrapper with preprocessing
 - Standard HuggingFace model files
-## Citation
-If you use this model, please cite:
 ```
-@misc{font-classifier-dinov2,
-  title={Font Classifier DINOv2},
-  author={Your Name},
-  year={2024},
-  url={https://huggingface.co/dchen0/font-classifier-v4}
-}
-```

 pipeline_tag: image-classification
 ---
+# Font Classifier DINOv2 (Server-Side Preprocessing)
+A fine-tuned DINOv2 model for font classification with **built-in preprocessing**.
+🎯 **Key Feature: No client-side preprocessing required!**
 ## Performance
+- **Accuracy**: ~86% on test set
+- **Preprocessing**: Automatic server-side pad-to-square + normalization
+## Usage
+### Simple API Usage (Recommended)
+Clients can send **raw images directly** to inference endpoints:
 ```python
+import requests
+import base64
+# Load your image
+with open("test_image.png", "rb") as f:
+    image_data = base64.b64encode(f.read()).decode()
+# Send to inference endpoint
+response = requests.post(
+    "https://your-endpoint.com",
+    headers={"Authorization": "Bearer YOUR_TOKEN"},
+    json={"inputs": image_data}
+)
+results = response.json()
+print(f"Predicted font: {results[0]['label']} ({results[0]['score']:.2%})")
 ```
+### Standard HuggingFace Usage
 ```python
 from transformers import pipeline
+# The model automatically handles preprocessing
+classifier = pipeline("image-classification", model="dchen0/font-classifier-v4")
+results = classifier("your_image.png")
+print(f"Predicted font: {results[0]['label']}")
+```
+### Direct Model Usage
+```python
+from PIL import Image
+import torch
+from transformers import AutoImageProcessor
+from font_classifier_with_preprocessing import FontClassifierWithPreprocessing
+# Load model and processor
+model = FontClassifierWithPreprocessing.from_pretrained("dchen0/font-classifier-v4")
+processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier-v4")
+# Process image (model handles pad_to_square automatically)
+image = Image.open("test.png")
+inputs = processor(images=image, return_tensors="pt")
+outputs = model(**inputs)
 ```
+## Model Architecture
 - **Base Model**: facebook/dinov2-base-imagenet1k-1-layer
+- **Fine-tuning**: LoRA on Google Fonts dataset
 - **Labels**: 394 font families
+- **Preprocessing**: Built-in pad-to-square + ImageNet normalization
+## Server-Side Preprocessing
+This model automatically applies the following preprocessing in its forward pass:
+1. **Pad to square** preserving aspect ratio
+2. **Resize** to 224×224
+3. **Normalize** with ImageNet statistics
+**No client-side preprocessing required** - just send raw images!
+## Deployment
+### HuggingFace Inference Endpoints
+1. Deploy this model to an Inference Endpoint
+2. Send raw images directly - preprocessing happens automatically
+3. Achieve ~86% accuracy out of the box
+### Custom Deployment
+The model includes preprocessing in the forward pass, so any deployment (TorchServe, TensorFlow Serving, etc.) will automatically apply correct preprocessing.
 ## Files
+- `font_classifier_with_preprocessing.py`: Custom model class with built-in preprocessing
 - Standard HuggingFace model files
+## Technical Details
+The model inherits from `Dinov2ForImageClassification` but overrides the forward pass to include:
+```python
+def forward(self, pixel_values=None, labels=None, **kwargs):
+    # Automatic preprocessing happens here
+    processed_pixel_values = self.preprocess_images(pixel_values)
+    return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs)
 ```
+This ensures that whether clients send raw images or pre-processed tensors, the model receives correctly formatted input.

config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "apply_layernorm": true,
   "architectures": [
-    "Dinov2ForImageClassification"
   ],
   "attention_probs_dropout_prob": 0.0,
   "drop_path_rate": 0.0,
@@ -837,5 +837,8 @@
   "torch_dtype": "float32",
   "transformers_version": "4.52.4",
   "use_mask_token": true,
-  "use_swiglu_ffn": false
-}

 {
   "apply_layernorm": true,
   "architectures": [
+    "FontClassifierWithPreprocessing"
   ],
   "attention_probs_dropout_prob": 0.0,
   "drop_path_rate": 0.0,
   "torch_dtype": "float32",
   "transformers_version": "4.52.4",
   "use_mask_token": true,
+  "use_swiglu_ffn": false,
+  "auto_map": {
+    "AutoModelForImageClassification": "font_classifier_with_preprocessing.FontClassifierWithPreprocessing"
+  }
+}

font_classifier_with_preprocessing.py ADDED Viewed

	@@ -0,0 +1,147 @@

+"""
+Custom DINOv2 model that includes pad_to_square preprocessing in the forward pass.
+This allows inference endpoints to automatically apply correct preprocessing.
+"""
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from transformers import Dinov2ForImageClassification
+class FontClassifierWithPreprocessing(Dinov2ForImageClassification):
+    """
+    DINOv2 model that automatically applies pad_to_square preprocessing.
+    This model can be deployed to Inference Endpoints and will automatically
+    handle preprocessing in the forward pass, so clients can send raw images.
+    """
+    def __init__(self, config):
+        super().__init__(config)
+        # Store preprocessing parameters
+        self.register_buffer('image_mean', torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1))
+        self.register_buffer('image_std', torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1))
+        self.target_size = 224
+    def pad_to_square_tensor(self, images):
+        """
+        Pad batch of images to square preserving aspect ratio.
+        Args:
+            images: Tensor of shape (B, C, H, W)
+        Returns:
+            Tensor of shape (B, C, max_size, max_size)
+        """
+        B, C, H, W = images.shape
+        max_size = max(H, W)
+        if H == W == max_size:
+            return images  # Already square
+        # Calculate padding
+        pad_h = max_size - H
+        pad_w = max_size - W
+        pad_top = pad_h // 2
+        pad_bottom = pad_h - pad_top
+        pad_left = pad_w // 2
+        pad_right = pad_w - pad_left
+        # Apply padding (left, right, top, bottom)
+        padded = F.pad(images, (pad_left, pad_right, pad_top, pad_bottom), value=0)
+        return padded
+    def preprocess_images(self, pixel_values):
+        """
+        Apply full preprocessing pipeline to raw or partially processed images.
+        Args:
+            pixel_values: Tensor of shape (B, C, H, W)
+        Returns:
+            Preprocessed tensor ready for DINOv2
+        """
+        # Ensure we have a batch dimension
+        if pixel_values.dim() == 3:
+            pixel_values = pixel_values.unsqueeze(0)
+        # Convert to float if needed
+        if pixel_values.dtype != torch.float32:
+            pixel_values = pixel_values.float()
+        # Normalize to [0, 1] if values are in [0, 255]
+        if pixel_values.max() > 1.0:
+            pixel_values = pixel_values / 255.0
+        # Apply pad_to_square
+        pixel_values = self.pad_to_square_tensor(pixel_values)
+        # Resize to target size
+        if pixel_values.shape[-1] != self.target_size or pixel_values.shape[-2] != self.target_size:
+            pixel_values = F.interpolate(
+                pixel_values,
+                size=(self.target_size, self.target_size),
+                mode='bilinear',
+                align_corners=False
+            )
+        # Apply ImageNet normalization
+        pixel_values = (pixel_values - self.image_mean) / self.image_std
+        return pixel_values
+    def forward(self, pixel_values=None, labels=None, **kwargs):
+        """
+        Forward pass with automatic preprocessing.
+        Args:
+            pixel_values: Raw or preprocessed images
+            labels: Optional labels for training
+        """
+        if pixel_values is None:
+            raise ValueError("pixel_values must be provided")
+        # Apply preprocessing automatically
+        processed_pixel_values = self.preprocess_images(pixel_values)
+        # Call parent forward with preprocessed images
+        return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs)
+# Function to convert existing model
+def convert_to_preprocessing_model(original_model_path, output_path):
+    """
+    Convert an existing DINOv2 model to include preprocessing.
+    Args:
+        original_model_path: Path to original model
+        output_path: Path to save converted model
+    """
+    print(f"Converting {original_model_path} to include preprocessing...")
+    # Load original model
+    original_model = Dinov2ForImageClassification.from_pretrained(original_model_path)
+    # Create new model with same config
+    preprocessing_model = FontClassifierWithPreprocessing(original_model.config)
+    # Copy all weights
+    preprocessing_model.load_state_dict(original_model.state_dict())
+    # Save the new model
+    preprocessing_model.save_pretrained(output_path, safe_serialization=True)
+    # Copy processor config (unchanged)
+    from transformers import AutoImageProcessor
+    processor = AutoImageProcessor.from_pretrained(original_model_path)
+    processor.save_pretrained(output_path)
+    print(f"✅ Converted model saved to {output_path}")
+    return preprocessing_model
+if __name__ == "__main__":
+    # Example: Convert existing model
+    convert_to_preprocessing_model(
+        "dchen0/font-classifier-v4",
+        "./font-classifier-with-preprocessing"
+    )

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0cd7bb6aa8492746ab58c79cc0a667d0654be31e1d50270140d1027e3523e0cb
-size 348769976

 version https://git-lfs.github.com/spec/v1
+oid sha256:0eeabb74d1af47629e61d6d4dd48bbf3eb74121db29c8ba8b644b41b8c481a6d
+size 348770168