SVHN-V1-ResNet
SVHN-V1-ResNet is a 21M parameter ResNet model trained on the SVHN dataset. Empirically, this method achieves a train accuracy of 98.90% and a test accuracy of 97.81%. SVHN-V1-ResNet shows strong performance across multiple other metrics.
Evaluation
F1 Score (Macro): 97.71%
F1 Score (Micro): 97.81%
Test Precision (Macro): 97.74%
Test Recall (Macro): 97.69%
Test Precision (Micro): 97.81%
Test Recall (Micro): 97.81%
ECE: 0.085
MCE: 0.24
Brier Score: 0.043
Prediction Mean Entropy: 0.55
Mean Confidence: 0.89
AUROC (OVR): 99.77%
AUPRC (Macro): 99.05%
The provided image shows a comprehensive analysis of the model's performance during training.
Technical Specifics
Epochs: 15
Parameters: ~21M
Architecture: ResNet-34
Optimizer: AdamW
Loss Function: Cross Entropy
Learning Rate: 0.001
Batch Size: 128
Mean Grad Norm (Final): ~0.005
Overall, this experimental model demonstrates ResNet's strong performance in computer vision tasks, particularly when paired with ReLU.
Usage
This model is a custom ResNet-34 architecture trained to recognize digits ($0-9$) from the SVHN (Street View House Numbers) dataset. Because it uses a custom implementation, you must set trust_remote_code=True when loading it.
Here are the three primary ways to use this model.
Prerequisites
Ensure you have the necessary libraries installed:
pip install transformers torch pillow
Method 1
This is the recommended way if you just want to get predictions. The pipeline handles image resizing, normalization, and mapping the output numbers back to digit labels (e.g., "3") automatically.
from transformers import pipeline
# Load the classification pipeline
pipe = pipeline(
"image-classification",
model="Dawntasy/SVHN-V1-ResNet",
trust_remote_code=True
)
# Predict using a URL or a local path
results = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")
# Print the top prediction
print(f"Predicted Digit: {results[0]['label']} (Confidence: {results[0]['score']:.4f})")
Method 2
Use this method if you need more control, such as running the model on a specific device (GPU/CPU) or processing batches of images. This separates the preprocessing from the inference.
import torch
from PIL import Image
from transformers import AutoModelForImageClassification, AutoImageProcessor
# 1. Load the processor and model
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoImageProcessor.from_pretrained("Dawntasy/SVHN-V1-ResNet")
model = AutoModelForImageClassification.from_pretrained(
"Dawntasy/SVHN-V1-ResNet",
trust_remote_code=True
).to(device)
# 2. Prepare the image
image = Image.open("your_digit_image.png").convert("RGB")
inputs = processor(image, return_tensors="pt").to(device)
# 3. Inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
# 4. Map index to label
print(f"Predicted Digit: {model.config.id2label[predicted_class_idx]}")
Method 3
Use this method if you don't care about the final classification but want to use the model as a feature extractor (e.g., for image similarity, clustering, or as input to another model). This returns the 512-dimensional vector from the final global average pooling layer.
from transformers import AutoModel, AutoImageProcessor
import torch
# Load the base "backbone" without the classification head
processor = AutoImageProcessor.from_pretrained("Dawntasy/SVHN-V1-ResNet")
base_model = AutoModel.from_pretrained(
"Dawntasy/SVHN-V1-ResNet",
trust_remote_code=True
)
image = Image.open("your_digit_image.png").convert("RGB")
inputs = processor(image, return_tensors="pt")
with torch.no_grad():
# The output will be the flattened features before the final FC layer
features = base_model(**inputs)
print(f"Feature vector shape: {features.logits.shape}") # torch.Size([1, 512])
Input Requirements & Tips
- Image Size: The model was trained on $64 \times 64$ pixel images. The
AutoImageProcessorhandles this resizing for you automatically. - Normalization: The model uses SVHN-specific mean and standard deviation:
- Mean:
[0.4377, 0.4438, 0.4728] - Std:
[0.198, 0.201, 0.197]
- Mean:
- Model Card Note: This model is designed for digit recognition. When shown non-digit images (like animals or landscapes), it will still output a digit label based on the visual patterns that most resemble numbers.
- Downloads last month
- 54
