Jersey Number Recognition - Temporal BiLSTM Model

Model Description

A BiLSTM-based temporal model for recognizing jersey numbers from video sequences, achieving 92.12% accuracy - a 43% improvement over single-frame baselines.

Key Features

🎯 92.12% full number accuracy
🎯 98.63% tens digit accuracy
🎯 93.04% units digit accuracy
🎯 89% temporal stability across player tracks
🎯 Compositional generalization to 100 classes (00-99)

Model Architecture

Input Sequence [8 × 3 × 128 × 128]
    ↓
EfficientNet-B0 Backbone (shared weights)
    ↓
256-D Embeddings [8 × 256]
    ↓
2-Layer Bidirectional LSTM (hidden: 128)
    ↓
Concatenated Hidden States [512]
    ↓
    ├─→ Tens Digit Head (10 classes)
    └─→ Units Digit Head (10 classes)

Parameters: 5.1M
Model Size: 20.3 MB

Intended Use

Primary Use Cases

Jersey number recognition in sports analytics
Temporal sequence modeling for visual recognition
Research in compositional generalization

Out-of-Scope Uses

Real-time applications (not optimized for inference speed)
Non-sports contexts without fine-tuning
Privacy-sensitive applications

How to Use

Installation

pip install torch torchvision pillow

Quick Start

import torch
from PIL import Image
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="prxkc/jersey-number-recognition",
    filename="best_temporal.pt"
)

# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')

# Note: You'll need the model architecture code
# See GitHub repository for complete implementation
# GitHub: https://github.com/prxkc/jersey-number-recognition

Complete Example

For complete usage with model architecture, see the GitHub Repository.

Training Data

Dataset: Custom jersey number dataset (subset)
Train samples: 4,096 sequences
Validation samples: 860 sequences
Test samples: 876 sequences
Classes: 10 jersey numbers (subset of 00-99)

Data Preprocessing

Frames resized to 128×128 pixels
Pad-to-square transformation
ImageNet normalization
8 frames uniformly sampled per sequence

Training Procedure

Hyperparameters

Backbone: EfficientNet-B0 (pretrained)
Optimizer: AdamW (lr=2e-4, weight_decay=1e-3)
Scheduler: Cosine annealing
Batch size: 32 (temporal), 128 (anchor)
Epochs: 10 (temporal), 4 (anchor warmstart)
Mixed precision: Enabled (AMP)

Training Strategy

Warmstart: Train anchor-only baseline (4 epochs)
Temporal training: BiLSTM model (10 epochs)
Backbone freezing: First 2 epochs
Balanced sampling: Digit-level balancing

Evaluation Results

Test Set Performance

Metric	Anchor (Baseline)	Temporal (Ours)	Improvement
Full Number Acc	48.97%	92.12%	+43.15%
Tens Digit Acc	92.81%	98.63%	+5.82%
Units Digit Acc	53.31%	93.04%	+39.73%
Loss	1.358	0.336	-75.3%

Temporal Stability

89% of tracks had zero prediction flips
Average 0.11 flips per track
Significant improvement over single-frame predictions

Per-Class Results

Jersey #	Test Sequences	Accuracy
4	164	95.73%
6	134	94.78%
8	301	90.70%
9	216	90.28%
48	4	100.00%
49	19	89.47%
66	19	100.00%
89	16	93.75%

Limitations

Trained on limited jersey number subset (10 classes)
Not optimized for real-time inference
Requires 8-frame sequences (not single images)
Performance may degrade on very different visual conditions

Contact

Author: Shakil Islam Shanto
GitHub: @prxkc

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Full Number Accuracy
self-reported

92.120
Tens Digit Accuracy
self-reported

98.630
Units Digit Accuracy
self-reported

93.040