Jersey Number Recognition - Temporal BiLSTM Model

Accuracy PyTorch License

Model Description

A BiLSTM-based temporal model for recognizing jersey numbers from video sequences, achieving 92.12% accuracy - a 43% improvement over single-frame baselines.

Key Features

  • 🎯 92.12% full number accuracy
  • 🎯 98.63% tens digit accuracy
  • 🎯 93.04% units digit accuracy
  • 🎯 89% temporal stability across player tracks
  • 🎯 Compositional generalization to 100 classes (00-99)

Model Architecture

Input Sequence [8 Γ— 3 Γ— 128 Γ— 128]
    ↓
EfficientNet-B0 Backbone (shared weights)
    ↓
256-D Embeddings [8 Γ— 256]
    ↓
2-Layer Bidirectional LSTM (hidden: 128)
    ↓
Concatenated Hidden States [512]
    ↓
    β”œβ”€β†’ Tens Digit Head (10 classes)
    └─→ Units Digit Head (10 classes)

Parameters: 5.1M
Model Size: 20.3 MB

Intended Use

Primary Use Cases

  • Jersey number recognition in sports analytics
  • Temporal sequence modeling for visual recognition
  • Research in compositional generalization

Out-of-Scope Uses

  • Real-time applications (not optimized for inference speed)
  • Non-sports contexts without fine-tuning
  • Privacy-sensitive applications

How to Use

Installation

pip install torch torchvision pillow

Quick Start

import torch
from PIL import Image
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="prxkc/jersey-number-recognition",
    filename="best_temporal.pt"
)

# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')

# Note: You'll need the model architecture code
# See GitHub repository for complete implementation
# GitHub: https://github.com/prxkc/jersey-number-recognition

Complete Example

For complete usage with model architecture, see the GitHub Repository.

Training Data

  • Dataset: Custom jersey number dataset (subset)
  • Train samples: 4,096 sequences
  • Validation samples: 860 sequences
  • Test samples: 876 sequences
  • Classes: 10 jersey numbers (subset of 00-99)

Data Preprocessing

  • Frames resized to 128Γ—128 pixels
  • Pad-to-square transformation
  • ImageNet normalization
  • 8 frames uniformly sampled per sequence

Training Procedure

Hyperparameters

  • Backbone: EfficientNet-B0 (pretrained)
  • Optimizer: AdamW (lr=2e-4, weight_decay=1e-3)
  • Scheduler: Cosine annealing
  • Batch size: 32 (temporal), 128 (anchor)
  • Epochs: 10 (temporal), 4 (anchor warmstart)
  • Mixed precision: Enabled (AMP)

Training Strategy

  1. Warmstart: Train anchor-only baseline (4 epochs)
  2. Temporal training: BiLSTM model (10 epochs)
  3. Backbone freezing: First 2 epochs
  4. Balanced sampling: Digit-level balancing

Evaluation Results

Test Set Performance

Metric Anchor (Baseline) Temporal (Ours) Improvement
Full Number Acc 48.97% 92.12% +43.15%
Tens Digit Acc 92.81% 98.63% +5.82%
Units Digit Acc 53.31% 93.04% +39.73%
Loss 1.358 0.336 -75.3%

Temporal Stability

  • 89% of tracks had zero prediction flips
  • Average 0.11 flips per track
  • Significant improvement over single-frame predictions

Per-Class Results

Jersey # Test Sequences Accuracy
4 164 95.73%
6 134 94.78%
8 301 90.70%
9 216 90.28%
48 4 100.00%
49 19 89.47%
66 19 100.00%
89 16 93.75%

Limitations

  • Trained on limited jersey number subset (10 classes)
  • Not optimized for real-time inference
  • Requires 8-frame sequences (not single images)
  • Performance may degrade on very different visual conditions

Contact

  • Author: Shakil Islam Shanto
  • GitHub: @prxkc
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results