Abstract
This work builds upon the Basketball Action Recognition Dataset (BARD), originally introduced to enable supervised learning for primary action recognition in NBA game footage. However, BARD's initial design lacks the granular annotations required to develop multi-stage computer vision pipelines involving object detection, jersey number recognition (JNR) and team attribution. To address these limitations, we present E-BARD (Extended Basketball Action Recognition Dataset), which bridges the gap between isolated action recognition and end-to-end scene-level reasoning through three key contributions.First, we introduce a new set of interrelated datasets that augment the original BARD videos with dense visual annotations. This includes detection data for key entities (ball, hoop, referee, player), team attribution based on uniform colors and JNR, all integrated to directly support and enrich the original action captions. Second, we establish a comprehensive benchmark for these specific visual understanding tasks using representative state-of-the-art models. We evaluate YOLO and RF-DETR for object detection; CLIP, SigLIP2, FashionCLIP, and the Perception Encoder for team color attribution; and olmOCR, Qwen2.5-VL-3B, and Qwen2.5-VL-7B for JNR. Finally, we propose a holistic, integrated approach based on Qwen2.5-VL, demonstrating the capacity of a unified multimodal framework to jointly address all subtasks simultaneously. Ultimately, E-BARD provides a comprehensive benchmark for multi-task basketball video understanding.
Model Card for E-BARD Basketball Object Detection Models
This repository hosts two fine-tuned object detection models:
- YOLOv8n
- RF-DETR Nano
Both models are trained to detect key entities in basketball footage:
- Basketball
- Hoop
- Player
- Referee
These models were developed as part of the E-BARD (Extended Basketball Action Recognition Dataset) project to support end-to-end basketball scene understanding pipelines.
Model Details
Developed by: Gabriele Giudici (Author of E-BARD)
Model Type: Object Detection
YOLOv8n
- Lightweight CNN detector
- ~3.15M parameters
RF-DETR Nano
- Lightweight transformer-based detector
- ~30.5M parameters
License: CC-BY-4.0
Finetuned from:
- Base YOLOv8n
- Base RF-DETR Nano
Model Sources
Code Repository
https://github.com/GabrieleGiudic/E-BARD
Original BARD Repository
https://github.com/GabrieleGiudic/BARD
Dataset Repository
https://huggingface.co/datasets/GabrieleGiudici/E-BARD-detection
Paper
E-BARD: A Multi-Task Extension of the Basketball Action Recognition Dataset for Player Detection, Team Attribution and Jersey Number Recognition.
Uses
Direct Use
These models detect four basketball entities in a single frame:
- Basketball
- Basketball hoop
- Basketball player
- Referee
Downstream Use
Detections can be integrated into sports analytics pipelines, including:
- Multi-object tracking (e.g., ByteTrack)
- Jersey number recognition (JNR)
- Team color attribution
- Tactical analysis
- Event understanding
Bias, Risks, and Limitations
- Models were trained on 720p footage downscaled to 704×704.
- Performance may degrade on lower resolutions or different aspect ratios.
- Dataset is derived from 2024–2025 NBA season footage, potentially biasing the models toward:
- NBA court layouts
- broadcast camera angles
- lighting conditions
- uniform styles
Possible limitations:
- Reduced performance on lower-tier leagues
- Reduced performance on street basketball environments
Model-specific limitations
YOLOv8n
- Struggles with very small objects like the basketball
- Recall@50: 0.566
RF-DETR Nano
- Conservative detection behavior
- Prioritizes precision over recall
Training Details
Training Data
The models were trained on the E-BARD Detection Dataset, derived from 60 BARD full-game recordings.
Dataset statistics
- Total Frames: 1,800
- Frames per game: 30
- Total Annotations: 22,210
Class Distribution
| Class | Instances |
|---|---|
| Players | 15,296 |
| Referees | 3,853 |
| Hoops | 1,565 |
| Basketballs | 1,496 |
Dataset split
- Training: 80%
- Validation: 10%
- Test: 10%
Training Procedure
Both models were trained using:
- Mixed precision (AMP)
- Early stopping
YOLOv8n
Epochs: 50
Resolution: 704×704
Batch Size: 64 (paper) / 32 (training script)
Augmentations:
- Mosaic (1.0)
- Copy-Paste (0.5)
- RandAugment
RF-DETR Nano
- Epochs: 50
- Resolution: 704×704
- Batch Size: 16
- Learning Rate: 1e-4
Evaluation
Testing Data
Evaluation was performed on the 10% held-out test split of E-BARD.
Metrics used:
- Precision
- Recall
- F1-score
- IoU threshold = 0.50
Results
YOLOv8n consistently outperformed RF-DETR Nano across most classes.
Per-Class Performance (@ IoU 0.5)
| Class | Metric | YOLOv8n | RF-DETR Nano |
|---|---|---|---|
| Basketball | Precision | 0.811 | 0.845 |
| Basketball | Recall | 0.566 | 0.322 |
| Basketball | F1 | 0.667 | 0.467 |
| Hoop | Precision | 0.993 | 0.944 |
| Hoop | Recall | 0.937 | 0.742 |
| Hoop | F1 | 0.964 | 0.831 |
| Player | Precision | 0.952 | 0.962 |
| Player | Recall | 0.949 | 0.908 |
| Player | F1 | 0.950 | 0.934 |
| Referee | Precision | 0.927 | 0.953 |
| Referee | Recall | 0.930 | 0.794 |
| Referee | F1 | 0.929 | 0.867 |
Code Examples
YOLOv8n Inference
from ultralytics import YOLO
yolo_model = YOLO("model/BODD_yolov8n_0001.pt")
yolo_results = yolo_model.predict(
source="data/yolo/test/images",
imgsz=704,
device="cuda",
conf=0.25,
iou=0.5
)
RF-DETR Nano Inference
from rfdetr import RFDETRNano
from PIL import Image
rfdetr_model = RFDETRNano(
pretrain_weights="model/BODD_rf-detr-nano_0000/checkpoint_best_total.pth"
)
img = Image.open("path/to/image.jpg").convert("RGB")
detections = rfdetr_model.predict(
img,
resolution=704,
conf_threshold=0.25
)
Full Evaluation Script
Look at evaluation folder https://github.com/GabrieleGiudic/E-BARD/detection/eval/yolo_vs_detr.py
Model tree for GabrieleGiudici/E-BARD-detection-models
Base model
Ultralytics/YOLOv8