Temporal Jersey Number Recognition

A lightweight temporal model for recognizing jersey numbers (00-99) from image sequences using EfficientNet + LSTM architecture.

Model Description

This model recognizes jersey numbers on sports players by analyzing temporal sequences of images. Instead of predicting jersey numbers directly, it uses a two-digit decomposition strategy - predicting tens and units digits independently. This allows the model to generalize to any 2-digit number (00-99) even when trained on a limited set of jersey numbers.

Key Features

  • Temporal Modeling: Uses LSTM to aggregate information across 8 frames for robust predictions
  • Generalization: Two-digit prediction strategy enables recognition of unseen jersey numbers
  • Lightweight: Only 5M parameters (21MB), suitable for production deployment
  • High Accuracy: Achieves 95.94% accuracy on test set

Model Architecture

Input: Sequence of 8 images (128x128 RGB)
  ↓
EfficientNet-B0 Backbone (frame-by-frame feature extraction)
  ↓
Feature Projection (1280 β†’ 256 dims)
  ↓
Bidirectional LSTM (2 layers, 128 hidden units)
  ↓
Two Independent Classifiers:
  β”œβ”€ Tens Digit Head (10 classes: 0-9)
  └─ Units Digit Head (10 classes: 0-9)
  ↓
Output: Jersey Number (00-99)

Model Statistics

  • Total Parameters: 5,194,384
  • Trainable Parameters: 5,194,384
  • Model Size: 20.78 MB (FP32)
  • Backbone: efficientnet_b0

Training Details

Dataset

  • Total Sequences: 4547
  • Train/Val/Test Split: 3180/677/690
  • Jersey Classes: 4, 48, 49, 6, 64, 66, 8, 88, 89, 9
  • Frames per Sequence: Variable (sampled to 8 frames uniformly)

Hyperparameters

  • Epochs: 50 (best at epoch 35)
  • Batch Size: 16
  • Optimizer: AdamW (lr=1e-3, weight_decay=1e-4)
  • Scheduler: Cosine Annealing
  • Loss: Cross-Entropy (sum of tens + units losses)

Data Augmentation

  • Random horizontal flip (p=0.5)
  • Random rotation (Β±10Β°)
  • Color jitter (brightness, contrast, saturation Β±0.2)
  • ImageNet normalization

Performance

Test Set Results

Metric Accuracy
Jersey Number 95.94%
Tens Digit 99.42%
Units Digit 96.23%

Per-Class Performance

The model generalizes well across different jersey numbers, including rare classes with limited training data.

Usage

Installation

pip install torch torchvision timm huggingface_hub pillow

Inference

import torch
from huggingface_hub import hf_hub_download
from PIL import Image
import torchvision.transforms as transforms

# Download model
model_path = hf_hub_download(
    repo_id="Akashpaul123/jersey-number-recognition-temporal",
    filename="best_model.pth"
)

# Load model
checkpoint = torch.load(model_path, map_location='cpu')
# Note: You need to define the TemporalJerseyRecognizer class (see model architecture)
model = TemporalJerseyRecognizer(
    backbone='efficientnet_b0',
    feature_dim=256,
    lstm_hidden=128,
    lstm_layers=2,
    dropout=0.3
)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Prepare transform
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load a sequence of 8 images
image_paths = ['frame1.jpg', 'frame2.jpg', ..., 'frame8.jpg']
images = []
for path in image_paths:
    img = Image.open(path).convert('RGB')
    img_tensor = transform(img)
    images.append(img_tensor)

# Stack into sequence: (1, 8, 3, 128, 128)
sequence = torch.stack(images).unsqueeze(0)

# Predict
with torch.no_grad():
    jersey_number, tens_digit, units_digit = model.predict_jersey_number(sequence)
    print(f"Predicted Jersey Number: {jersey_number.item()}")
    print(f"Tens Digit: {tens_digit.item()}, Units Digit: {units_digit.item()}")

Model Files

This repository contains:

  • best_model.pth - Model checkpoint with weights and configuration
  • model_architecture.py - Complete model architecture code
  • results_summary.json - Detailed training results and metrics
  • training_history.csv - Epoch-by-epoch training metrics
  • confusion_matrices.png - Confusion matrices visualization
  • training_curves.png - Training and validation curves
  • sample_predictions.png - Sample predictions on test set

Generalization Capability

The model's two-digit decomposition strategy enables it to recognize jersey numbers it has never seen during training:

Training Classes: 4, 6, 8, 9, 48, 49, 64, 66, 88, 89

Can Generalize To: Any combination of tens (0-9) and units (0-9) digits = 100 possible jersey numbers (00-99)

For example, even though the model never saw jersey #46, #69, or #94 during training, it can predict them by combining learned digit representations.

Limitations

  • Requires sequences of 8 frames (can handle variable lengths with padding)
  • Optimized for jersey numbers 00-99 (won't work for triple-digit numbers)
  • Performance depends on image quality and viewing angle
  • Limited training data for some digit combinations (especially digits 1, 2, 3, 5, 7 in tens place)

Future Improvements

  1. More training data for rare digit combinations
  2. Attention mechanisms instead of LSTM for better temporal modeling
  3. Multi-scale feature extraction for handling different jersey sizes
  4. Temporal smoothing for tracking applications
  5. Knowledge distillation for even smaller models

Citation

If you use this model, please cite:

@misc{jersey_number_recognition_2025,
  title={Temporal Jersey Number Recognition},
  author={Akash Paul},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/Akashpaul123/jersey-number-recognition-temporal}
}

License

Apache 2.0

Contact

For questions or feedback, please open an issue in this repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results