SVHN-V1-ResNet

SVHN-V1-ResNet is a 21M parameter ResNet model trained on the SVHN dataset. Empirically, this method achieves a train accuracy of 98.90% and a test accuracy of 97.81%. SVHN-V1-ResNet shows strong performance across multiple other metrics.

Evaluation

F1 Score (Macro): 97.71%

F1 Score (Micro): 97.81%

Test Precision (Macro): 97.74%

Test Recall (Macro): 97.69%

Test Precision (Micro): 97.81%

Test Recall (Micro): 97.81%

ECE: 0.085

MCE: 0.24

Brier Score: 0.043

Prediction Mean Entropy: 0.55

Mean Confidence: 0.89

AUROC (OVR): 99.77%

AUPRC (Macro): 99.05%

The provided image shows a comprehensive analysis of the model's performance during training.

Technical Specifics

Epochs: 15

Parameters: ~21M

Architecture: ResNet-34

Optimizer: AdamW

Loss Function: Cross Entropy

Learning Rate: 0.001

Batch Size: 128

Mean Grad Norm (Final): ~0.005

Overall, this experimental model demonstrates ResNet's strong performance in computer vision tasks, particularly when paired with ReLU.

Usage

This model is a custom ResNet-34 architecture trained to recognize digits ($0-9$) from the SVHN (Street View House Numbers) dataset. Because it uses a custom implementation, you must set trust_remote_code=True when loading it.

Here are the three primary ways to use this model.

Prerequisites

Ensure you have the necessary libraries installed:

pip install transformers torch pillow

Method 1

This is the recommended way if you just want to get predictions. The pipeline handles image resizing, normalization, and mapping the output numbers back to digit labels (e.g., "3") automatically.

from transformers import pipeline

# Load the classification pipeline
pipe = pipeline(
    "image-classification", 
    model="Dawntasy/SVHN-V1-ResNet", 
    trust_remote_code=True
)

# Predict using a URL or a local path
results = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")

# Print the top prediction
print(f"Predicted Digit: {results[0]['label']} (Confidence: {results[0]['score']:.4f})")

Method 2

Use this method if you need more control, such as running the model on a specific device (GPU/CPU) or processing batches of images. This separates the preprocessing from the inference.

import torch
from PIL import Image
from transformers import AutoModelForImageClassification, AutoImageProcessor

# 1. Load the processor and model
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoImageProcessor.from_pretrained("Dawntasy/SVHN-V1-ResNet")
model = AutoModelForImageClassification.from_pretrained(
    "Dawntasy/SVHN-V1-ResNet", 
    trust_remote_code=True
).to(device)

# 2. Prepare the image
image = Image.open("your_digit_image.png").convert("RGB")
inputs = processor(image, return_tensors="pt").to(device)

# 3. Inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_idx = logits.argmax(-1).item()

# 4. Map index to label
print(f"Predicted Digit: {model.config.id2label[predicted_class_idx]}")

Method 3

Use this method if you don't care about the final classification but want to use the model as a feature extractor (e.g., for image similarity, clustering, or as input to another model). This returns the 512-dimensional vector from the final global average pooling layer.

from transformers import AutoModel, AutoImageProcessor
import torch

# Load the base "backbone" without the classification head
processor = AutoImageProcessor.from_pretrained("Dawntasy/SVHN-V1-ResNet")
base_model = AutoModel.from_pretrained(
    "Dawntasy/SVHN-V1-ResNet", 
    trust_remote_code=True
)

image = Image.open("your_digit_image.png").convert("RGB")
inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    # The output will be the flattened features before the final FC layer
    features = base_model(**inputs)
    
print(f"Feature vector shape: {features.logits.shape}") # torch.Size([1, 512])

Input Requirements & Tips

Image Size: The model was trained on $64 \times 64$ pixel images. The AutoImageProcessor handles this resizing for you automatically.
Normalization: The model uses SVHN-specific mean and standard deviation:
- Mean: [0.4377, 0.4438, 0.4728]
- Std: [0.198, 0.201, 0.197]
Model Card Note: This model is designed for digit recognition. When shown non-digit images (like animals or landscapes), it will still output a digit label based on the visual patterns that most resemble numbers.

Downloads last month: 24

Safetensors

Model size

21.3M params

Tensor type

F32

Dataset used to train Dawntasy/SVHN-V1-ResNet

Space using Dawntasy/SVHN-V1-ResNet 1

Collection including Dawntasy/SVHN-V1-ResNet

SVHN Experimental

Collection

Experiments run on the SVHN dataset. • 1 item • Updated Jan 7 • 1