Temporal Jersey Number Recognition
A lightweight temporal model for recognizing jersey numbers (00-99) from image sequences using EfficientNet + LSTM architecture.
Model Description
This model recognizes jersey numbers on sports players by analyzing temporal sequences of images. Instead of predicting jersey numbers directly, it uses a two-digit decomposition strategy - predicting tens and units digits independently. This allows the model to generalize to any 2-digit number (00-99) even when trained on a limited set of jersey numbers.
Key Features
- Temporal Modeling: Uses LSTM to aggregate information across 8 frames for robust predictions
- Generalization: Two-digit prediction strategy enables recognition of unseen jersey numbers
- Lightweight: Only
5M parameters (21MB), suitable for production deployment - High Accuracy: Achieves 95.94% accuracy on test set
Model Architecture
Input: Sequence of 8 images (128x128 RGB)
β
EfficientNet-B0 Backbone (frame-by-frame feature extraction)
β
Feature Projection (1280 β 256 dims)
β
Bidirectional LSTM (2 layers, 128 hidden units)
β
Two Independent Classifiers:
ββ Tens Digit Head (10 classes: 0-9)
ββ Units Digit Head (10 classes: 0-9)
β
Output: Jersey Number (00-99)
Model Statistics
- Total Parameters: 5,194,384
- Trainable Parameters: 5,194,384
- Model Size: 20.78 MB (FP32)
- Backbone: efficientnet_b0
Training Details
Dataset
- Total Sequences: 4547
- Train/Val/Test Split: 3180/677/690
- Jersey Classes: 4, 48, 49, 6, 64, 66, 8, 88, 89, 9
- Frames per Sequence: Variable (sampled to 8 frames uniformly)
Hyperparameters
- Epochs: 50 (best at epoch 35)
- Batch Size: 16
- Optimizer: AdamW (lr=1e-3, weight_decay=1e-4)
- Scheduler: Cosine Annealing
- Loss: Cross-Entropy (sum of tens + units losses)
Data Augmentation
- Random horizontal flip (p=0.5)
- Random rotation (Β±10Β°)
- Color jitter (brightness, contrast, saturation Β±0.2)
- ImageNet normalization
Performance
Test Set Results
| Metric | Accuracy |
|---|---|
| Jersey Number | 95.94% |
| Tens Digit | 99.42% |
| Units Digit | 96.23% |
Per-Class Performance
The model generalizes well across different jersey numbers, including rare classes with limited training data.
Usage
Installation
pip install torch torchvision timm huggingface_hub pillow
Inference
import torch
from huggingface_hub import hf_hub_download
from PIL import Image
import torchvision.transforms as transforms
# Download model
model_path = hf_hub_download(
repo_id="Akashpaul123/jersey-number-recognition-temporal",
filename="best_model.pth"
)
# Load model
checkpoint = torch.load(model_path, map_location='cpu')
# Note: You need to define the TemporalJerseyRecognizer class (see model architecture)
model = TemporalJerseyRecognizer(
backbone='efficientnet_b0',
feature_dim=256,
lstm_hidden=128,
lstm_layers=2,
dropout=0.3
)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Prepare transform
transform = transforms.Compose([
transforms.Resize((128, 128)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
# Load a sequence of 8 images
image_paths = ['frame1.jpg', 'frame2.jpg', ..., 'frame8.jpg']
images = []
for path in image_paths:
img = Image.open(path).convert('RGB')
img_tensor = transform(img)
images.append(img_tensor)
# Stack into sequence: (1, 8, 3, 128, 128)
sequence = torch.stack(images).unsqueeze(0)
# Predict
with torch.no_grad():
jersey_number, tens_digit, units_digit = model.predict_jersey_number(sequence)
print(f"Predicted Jersey Number: {jersey_number.item()}")
print(f"Tens Digit: {tens_digit.item()}, Units Digit: {units_digit.item()}")
Model Files
This repository contains:
best_model.pth- Model checkpoint with weights and configurationmodel_architecture.py- Complete model architecture coderesults_summary.json- Detailed training results and metricstraining_history.csv- Epoch-by-epoch training metricsconfusion_matrices.png- Confusion matrices visualizationtraining_curves.png- Training and validation curvessample_predictions.png- Sample predictions on test set
Generalization Capability
The model's two-digit decomposition strategy enables it to recognize jersey numbers it has never seen during training:
Training Classes: 4, 6, 8, 9, 48, 49, 64, 66, 88, 89
Can Generalize To: Any combination of tens (0-9) and units (0-9) digits = 100 possible jersey numbers (00-99)
For example, even though the model never saw jersey #46, #69, or #94 during training, it can predict them by combining learned digit representations.
Limitations
- Requires sequences of 8 frames (can handle variable lengths with padding)
- Optimized for jersey numbers 00-99 (won't work for triple-digit numbers)
- Performance depends on image quality and viewing angle
- Limited training data for some digit combinations (especially digits 1, 2, 3, 5, 7 in tens place)
Future Improvements
- More training data for rare digit combinations
- Attention mechanisms instead of LSTM for better temporal modeling
- Multi-scale feature extraction for handling different jersey sizes
- Temporal smoothing for tracking applications
- Knowledge distillation for even smaller models
Citation
If you use this model, please cite:
@misc{jersey_number_recognition_2025,
title={Temporal Jersey Number Recognition},
author={Akash Paul},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/Akashpaul123/jersey-number-recognition-temporal}
}
License
Apache 2.0
Contact
For questions or feedback, please open an issue in this repository.
Evaluation results
- Jersey Number Accuracyself-reported0.959