|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- image-classification |
|
|
- index-cards |
|
|
- library-archives |
|
|
- timm |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
library_name: timm |
|
|
pipeline_tag: image-classification |
|
|
--- |
|
|
|
|
|
# Index Card Classifier |
|
|
|
|
|
A fine-tuned image classifier for detecting library index cards vs other image types |
|
|
(verso/back of cards, covers, blank pages). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: efficientnet_b0 |
|
|
- **Task**: Binary classification (index_card vs other) |
|
|
- **Training Data**: 70 images |
|
|
- **Validation Data**: 30 images |
|
|
- **Framework**: PyTorch + timm |
|
|
|
|
|
## Training Data Distribution |
|
|
|
|
|
``` |
|
|
{ |
|
|
"cover": 1, |
|
|
"blank": 1, |
|
|
"needs_review": 1, |
|
|
"verso": 49, |
|
|
"index_card": 46, |
|
|
"other": 2 |
|
|
} |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| Validation Accuracy | 96.7% | |
|
|
| Validation F1 (index_card) | 0.966 | |
|
|
| Validation F1 (other) | 0.968 | |
|
|
|
|
|
### Classification Report |
|
|
|
|
|
``` |
|
|
precision recall f1-score support |
|
|
|
|
|
other 1.000 0.938 0.968 16 |
|
|
index_card 0.933 1.000 0.966 14 |
|
|
|
|
|
accuracy 0.967 30 |
|
|
macro avg 0.967 0.969 0.967 30 |
|
|
weighted avg 0.969 0.967 0.967 30 |
|
|
|
|
|
``` |
|
|
|
|
|
### Confusion Matrix |
|
|
|
|
|
``` |
|
|
[[15 1] |
|
|
[ 0 14]] |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import timm |
|
|
import torch |
|
|
from huggingface_hub import hf_hub_download |
|
|
from PIL import Image |
|
|
from safetensors.torch import load_file |
|
|
from torchvision import transforms |
|
|
|
|
|
# Download and load model from Hub |
|
|
weights_path = hf_hub_download( |
|
|
repo_id="davanstrien/nls-index-card-classifier", |
|
|
filename="classifier.safetensors" |
|
|
) |
|
|
model = timm.create_model('efficientnet_b0', pretrained=False, num_classes=2) |
|
|
model.load_state_dict(load_file(weights_path)) |
|
|
model.eval() |
|
|
|
|
|
# Preprocess |
|
|
transform = transforms.Compose([ |
|
|
transforms.Resize((224, 224)), |
|
|
transforms.ToTensor(), |
|
|
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), |
|
|
]) |
|
|
|
|
|
# Inference |
|
|
image = Image.open('card.jpg').convert('RGB') |
|
|
input_tensor = transform(image).unsqueeze(0) |
|
|
|
|
|
with torch.no_grad(): |
|
|
output = model(input_tensor) |
|
|
probs = torch.softmax(output, dim=1) |
|
|
pred = output.argmax(1).item() |
|
|
confidence = probs[0, pred].item() |
|
|
|
|
|
classes = ['other', 'index_card'] |
|
|
print(f"Prediction: {classes[pred]} ({confidence:.1%})") |
|
|
``` |
|
|
|
|
|
## Training |
|
|
|
|
|
Trained using frozen backbone with only classifier head fine-tuned. |
|
|
|
|
|
```bash |
|
|
python train_classifier.py --model efficientnet_b0 --epochs 20 --val-split 0.3 |
|
|
``` |
|
|
|
|
|
## Context |
|
|
|
|
|
This model was developed for the National Library of Scotland to help process |
|
|
digitized manuscript index cards from the Advocate's Library collection. |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|