Jersey Number Recognition - Temporal BiLSTM Model
Model Description
A BiLSTM-based temporal model for recognizing jersey numbers from video sequences, achieving 92.12% accuracy - a 43% improvement over single-frame baselines.
Key Features
- π― 92.12% full number accuracy
- π― 98.63% tens digit accuracy
- π― 93.04% units digit accuracy
- π― 89% temporal stability across player tracks
- π― Compositional generalization to 100 classes (00-99)
Model Architecture
Input Sequence [8 Γ 3 Γ 128 Γ 128]
β
EfficientNet-B0 Backbone (shared weights)
β
256-D Embeddings [8 Γ 256]
β
2-Layer Bidirectional LSTM (hidden: 128)
β
Concatenated Hidden States [512]
β
βββ Tens Digit Head (10 classes)
βββ Units Digit Head (10 classes)
Parameters: 5.1M
Model Size: 20.3 MB
Intended Use
Primary Use Cases
- Jersey number recognition in sports analytics
- Temporal sequence modeling for visual recognition
- Research in compositional generalization
Out-of-Scope Uses
- Real-time applications (not optimized for inference speed)
- Non-sports contexts without fine-tuning
- Privacy-sensitive applications
How to Use
Installation
pip install torch torchvision pillow
Quick Start
import torch
from PIL import Image
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(
repo_id="prxkc/jersey-number-recognition",
filename="best_temporal.pt"
)
# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')
# Note: You'll need the model architecture code
# See GitHub repository for complete implementation
# GitHub: https://github.com/prxkc/jersey-number-recognition
Complete Example
For complete usage with model architecture, see the GitHub Repository.
Training Data
- Dataset: Custom jersey number dataset (subset)
- Train samples: 4,096 sequences
- Validation samples: 860 sequences
- Test samples: 876 sequences
- Classes: 10 jersey numbers (subset of 00-99)
Data Preprocessing
- Frames resized to 128Γ128 pixels
- Pad-to-square transformation
- ImageNet normalization
- 8 frames uniformly sampled per sequence
Training Procedure
Hyperparameters
- Backbone: EfficientNet-B0 (pretrained)
- Optimizer: AdamW (lr=2e-4, weight_decay=1e-3)
- Scheduler: Cosine annealing
- Batch size: 32 (temporal), 128 (anchor)
- Epochs: 10 (temporal), 4 (anchor warmstart)
- Mixed precision: Enabled (AMP)
Training Strategy
- Warmstart: Train anchor-only baseline (4 epochs)
- Temporal training: BiLSTM model (10 epochs)
- Backbone freezing: First 2 epochs
- Balanced sampling: Digit-level balancing
Evaluation Results
Test Set Performance
| Metric | Anchor (Baseline) | Temporal (Ours) | Improvement |
|---|---|---|---|
| Full Number Acc | 48.97% | 92.12% | +43.15% |
| Tens Digit Acc | 92.81% | 98.63% | +5.82% |
| Units Digit Acc | 53.31% | 93.04% | +39.73% |
| Loss | 1.358 | 0.336 | -75.3% |
Temporal Stability
- 89% of tracks had zero prediction flips
- Average 0.11 flips per track
- Significant improvement over single-frame predictions
Per-Class Results
| Jersey # | Test Sequences | Accuracy |
|---|---|---|
| 4 | 164 | 95.73% |
| 6 | 134 | 94.78% |
| 8 | 301 | 90.70% |
| 9 | 216 | 90.28% |
| 48 | 4 | 100.00% |
| 49 | 19 | 89.47% |
| 66 | 19 | 100.00% |
| 89 | 16 | 93.75% |
Limitations
- Trained on limited jersey number subset (10 classes)
- Not optimized for real-time inference
- Requires 8-frame sequences (not single images)
- Performance may degrade on very different visual conditions
Contact
- Author: Shakil Islam Shanto
- GitHub: @prxkc
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Evaluation results
- Full Number Accuracyself-reported92.120
- Tens Digit Accuracyself-reported98.630
- Units Digit Accuracyself-reported93.040