Temporal Jersey Number Recognition

A lightweight temporal model for recognizing jersey numbers (00-99) from image sequences using EfficientNet + LSTM architecture.

Model Description

This model recognizes jersey numbers on sports players by analyzing temporal sequences of images. Instead of predicting jersey numbers directly, it uses a two-digit decomposition strategy - predicting tens and units digits independently. This allows the model to generalize to any 2-digit number (00-99) even when trained on a limited set of jersey numbers.

Key Features

Temporal Modeling: Uses LSTM to aggregate information across 8 frames for robust predictions
Generalization: Two-digit prediction strategy enables recognition of unseen jersey numbers
Lightweight: Only ~~5M parameters (~~21MB), suitable for production deployment
High Accuracy: Achieves 95.94% accuracy on test set

Model Architecture

Input: Sequence of 8 images (128x128 RGB)
  ↓
EfficientNet-B0 Backbone (frame-by-frame feature extraction)
  ↓
Feature Projection (1280 → 256 dims)
  ↓
Bidirectional LSTM (2 layers, 128 hidden units)
  ↓
Two Independent Classifiers:
  ├─ Tens Digit Head (10 classes: 0-9)
  └─ Units Digit Head (10 classes: 0-9)
  ↓
Output: Jersey Number (00-99)

Model Statistics

Total Parameters: 5,194,384
Trainable Parameters: 5,194,384
Model Size: 20.78 MB (FP32)
Backbone: efficientnet_b0

Training Details

Dataset

Total Sequences: 4547
Train/Val/Test Split: 3180/677/690
Jersey Classes: 4, 48, 49, 6, 64, 66, 8, 88, 89, 9
Frames per Sequence: Variable (sampled to 8 frames uniformly)

Hyperparameters

Epochs: 50 (best at epoch 35)
Batch Size: 16
Optimizer: AdamW (lr=1e-3, weight_decay=1e-4)
Scheduler: Cosine Annealing
Loss: Cross-Entropy (sum of tens + units losses)

Data Augmentation

Random horizontal flip (p=0.5)
Random rotation (±10°)
Color jitter (brightness, contrast, saturation ±0.2)
ImageNet normalization

Performance

Test Set Results

Metric	Accuracy
Jersey Number	95.94%
Tens Digit	99.42%
Units Digit	96.23%

Per-Class Performance

The model generalizes well across different jersey numbers, including rare classes with limited training data.

Usage

Installation

pip install torch torchvision timm huggingface_hub pillow

Inference

import torch
from huggingface_hub import hf_hub_download
from PIL import Image
import torchvision.transforms as transforms

# Download model
model_path = hf_hub_download(
    repo_id="Akashpaul123/jersey-number-recognition-temporal",
    filename="best_model.pth"
)

# Load model
checkpoint = torch.load(model_path, map_location='cpu')
# Note: You need to define the TemporalJerseyRecognizer class (see model architecture)
model = TemporalJerseyRecognizer(
    backbone='efficientnet_b0',
    feature_dim=256,
    lstm_hidden=128,
    lstm_layers=2,
    dropout=0.3
)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Prepare transform
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load a sequence of 8 images
image_paths = ['frame1.jpg', 'frame2.jpg', ..., 'frame8.jpg']
images = []
for path in image_paths:
    img = Image.open(path).convert('RGB')
    img_tensor = transform(img)
    images.append(img_tensor)

# Stack into sequence: (1, 8, 3, 128, 128)
sequence = torch.stack(images).unsqueeze(0)

# Predict
with torch.no_grad():
    jersey_number, tens_digit, units_digit = model.predict_jersey_number(sequence)
    print(f"Predicted Jersey Number: {jersey_number.item()}")
    print(f"Tens Digit: {tens_digit.item()}, Units Digit: {units_digit.item()}")

Model Files

This repository contains:

best_model.pth - Model checkpoint with weights and configuration
model_architecture.py - Complete model architecture code
results_summary.json - Detailed training results and metrics
training_history.csv - Epoch-by-epoch training metrics
confusion_matrices.png - Confusion matrices visualization
training_curves.png - Training and validation curves
sample_predictions.png - Sample predictions on test set

Generalization Capability

The model's two-digit decomposition strategy enables it to recognize jersey numbers it has never seen during training:

Training Classes: 4, 6, 8, 9, 48, 49, 64, 66, 88, 89

Can Generalize To: Any combination of tens (0-9) and units (0-9) digits = 100 possible jersey numbers (00-99)

For example, even though the model never saw jersey #46, #69, or #94 during training, it can predict them by combining learned digit representations.

Limitations

Requires sequences of 8 frames (can handle variable lengths with padding)
Optimized for jersey numbers 00-99 (won't work for triple-digit numbers)
Performance depends on image quality and viewing angle
Limited training data for some digit combinations (especially digits 1, 2, 3, 5, 7 in tens place)

Future Improvements

More training data for rare digit combinations
Attention mechanisms instead of LSTM for better temporal modeling
Multi-scale feature extraction for handling different jersey sizes
Temporal smoothing for tracking applications
Knowledge distillation for even smaller models

Citation

If you use this model, please cite:

@misc{jersey_number_recognition_2025,
  title={Temporal Jersey Number Recognition},
  author={Akash Paul},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/Akashpaul123/jersey-number-recognition-temporal}
}

License

Apache 2.0

Contact

For questions or feedback, please open an issue in this repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Jersey Number Accuracy
self-reported

0.959