--- license: apache-2.0 pipeline_tag: image-classification library_name: transformers datasets: - dchen0/font_crops_v4 --- # Font Classifier DINOv2 (Server-Side Preprocessing) A fine-tuned DINOv2 model for font classification with **built-in preprocessing**. 🎯 **Key Feature: No client-side preprocessing required!** ## Performance - **Accuracy**: ~86% on test set - **Preprocessing**: Automatic server-side pad-to-square + normalization ## Usage ### Simple API Usage (Recommended) Clients can send **raw images directly** to inference endpoints: ```python import requests import base64 # Load your image with open("test_image.png", "rb") as f: image_data = base64.b64encode(f.read()).decode() # Send to inference endpoint response = requests.post( "https://your-endpoint.com", headers={"Authorization": "Bearer YOUR_TOKEN"}, json={"inputs": image_data} ) results = response.json() print(f"Predicted font: {results[0]['label']} ({results[0]['score']:.2%})") ``` ### Standard HuggingFace Usage ```python from transformers import pipeline # The model automatically handles preprocessing classifier = pipeline("image-classification", model="dchen0/font-classifier-v4") results = classifier("your_image.png") print(f"Predicted font: {results[0]['label']}") ``` ### Direct Model Usage ```python from PIL import Image import torch from transformers import AutoImageProcessor from font_classifier_with_preprocessing import FontClassifierWithPreprocessing # Load model and processor model = FontClassifierWithPreprocessing.from_pretrained("dchen0/font-classifier-v4") processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier-v4") # Process image (model handles pad_to_square automatically) image = Image.open("test.png") inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs) ``` ## Model Architecture - **Base Model**: facebook/dinov2-base-imagenet1k-1-layer - **Fine-tuning**: LoRA on Google Fonts dataset - **Labels**: 394 font families - **Preprocessing**: Built-in pad-to-square + ImageNet normalization ## Server-Side Preprocessing This model automatically applies the following preprocessing in its forward pass: 1. **Pad to square** preserving aspect ratio 2. **Resize** to 224×224 3. **Normalize** with ImageNet statistics **No client-side preprocessing required** - just send raw images! ## Deployment ### HuggingFace Inference Endpoints 1. Deploy this model to an Inference Endpoint 2. Send raw images directly - preprocessing happens automatically 3. Achieve ~86% accuracy out of the box ### Custom Deployment The model includes preprocessing in the forward pass, so any deployment (TorchServe, TensorFlow Serving, etc.) will automatically apply correct preprocessing. ## Files - `font_classifier_with_preprocessing.py`: Custom model class with built-in preprocessing - Standard HuggingFace model files ## Technical Details The model inherits from `Dinov2ForImageClassification` but overrides the forward pass to include: ```python def forward(self, pixel_values=None, labels=None, **kwargs): # Automatic preprocessing happens here processed_pixel_values = self.preprocess_images(pixel_values) return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs) ``` This ensures that whether clients send raw images or pre-processed tensors, the model receives correctly formatted input.