Abstract

This work builds upon the Basketball Action Recognition Dataset (BARD), originally introduced to enable supervised learning for primary action recognition in NBA game footage. However, BARD's initial design lacks the granular annotations required to develop multi-stage computer vision pipelines involving object detection, jersey number recognition (JNR) and team attribution. To address these limitations, we present E-BARD (Extended Basketball Action Recognition Dataset), which bridges the gap between isolated action recognition and end-to-end scene-level reasoning through three key contributions.First, we introduce a new set of interrelated datasets that augment the original BARD videos with dense visual annotations. This includes detection data for key entities (ball, hoop, referee, player), team attribution based on uniform colors and JNR, all integrated to directly support and enrich the original action captions. Second, we establish a comprehensive benchmark for these specific visual understanding tasks using representative state-of-the-art models. We evaluate YOLO and RF-DETR for object detection; CLIP, SigLIP2, FashionCLIP, and the Perception Encoder for team color attribution; and olmOCR, Qwen2.5-VL-3B, and Qwen2.5-VL-7B for JNR. Finally, we propose a holistic, integrated approach based on Qwen2.5-VL, demonstrating the capacity of a unified multimodal framework to jointly address all subtasks simultaneously. Ultimately, E-BARD provides a comprehensive benchmark for multi-task basketball video understanding.

Model Card for E-BARD Basketball Object Detection Models

This repository hosts two fine-tuned object detection models:

  • YOLOv8n
  • RF-DETR Nano

Both models are trained to detect key entities in basketball footage:

  • Basketball
  • Hoop
  • Player
  • Referee

These models were developed as part of the E-BARD (Extended Basketball Action Recognition Dataset) project to support end-to-end basketball scene understanding pipelines.


Model Details

Developed by: Gabriele Giudici (Author of E-BARD)

Model Type: Object Detection

YOLOv8n

  • Lightweight CNN detector
  • ~3.15M parameters

RF-DETR Nano

  • Lightweight transformer-based detector
  • ~30.5M parameters

License: CC-BY-4.0

Finetuned from:

  • Base YOLOv8n
  • Base RF-DETR Nano

Model Sources

Code Repository
https://github.com/GabrieleGiudic/E-BARD

Original BARD Repository
https://github.com/GabrieleGiudic/BARD

Dataset Repository
https://huggingface.co/datasets/GabrieleGiudici/E-BARD-detection

Paper
E-BARD: A Multi-Task Extension of the Basketball Action Recognition Dataset for Player Detection, Team Attribution and Jersey Number Recognition.


Uses

Direct Use

These models detect four basketball entities in a single frame:

  • Basketball
  • Basketball hoop
  • Basketball player
  • Referee

Downstream Use

Detections can be integrated into sports analytics pipelines, including:

  • Multi-object tracking (e.g., ByteTrack)
  • Jersey number recognition (JNR)
  • Team color attribution
  • Tactical analysis
  • Event understanding

Bias, Risks, and Limitations

  • Models were trained on 720p footage downscaled to 704×704.
  • Performance may degrade on lower resolutions or different aspect ratios.
  • Dataset is derived from 2024–2025 NBA season footage, potentially biasing the models toward:
    • NBA court layouts
    • broadcast camera angles
    • lighting conditions
    • uniform styles

Possible limitations:

  • Reduced performance on lower-tier leagues
  • Reduced performance on street basketball environments

Model-specific limitations

YOLOv8n

  • Struggles with very small objects like the basketball
  • Recall@50: 0.566

RF-DETR Nano

  • Conservative detection behavior
  • Prioritizes precision over recall

Training Details

Training Data

The models were trained on the E-BARD Detection Dataset, derived from 60 BARD full-game recordings.

Dataset statistics

  • Total Frames: 1,800
  • Frames per game: 30
  • Total Annotations: 22,210

Class Distribution

Class Instances
Players 15,296
Referees 3,853
Hoops 1,565
Basketballs 1,496

Dataset split

  • Training: 80%
  • Validation: 10%
  • Test: 10%

Training Procedure

Both models were trained using:

  • Mixed precision (AMP)
  • Early stopping

YOLOv8n

  • Epochs: 50

  • Resolution: 704×704

  • Batch Size: 64 (paper) / 32 (training script)

  • Augmentations:

    • Mosaic (1.0)
    • Copy-Paste (0.5)
    • RandAugment

RF-DETR Nano

  • Epochs: 50
  • Resolution: 704×704
  • Batch Size: 16
  • Learning Rate: 1e-4

Evaluation

Testing Data

Evaluation was performed on the 10% held-out test split of E-BARD.

Metrics used:

  • Precision
  • Recall
  • F1-score
  • IoU threshold = 0.50

Results

YOLOv8n consistently outperformed RF-DETR Nano across most classes.

Per-Class Performance (@ IoU 0.5)

Class Metric YOLOv8n RF-DETR Nano
Basketball Precision 0.811 0.845
Basketball Recall 0.566 0.322
Basketball F1 0.667 0.467
Hoop Precision 0.993 0.944
Hoop Recall 0.937 0.742
Hoop F1 0.964 0.831
Player Precision 0.952 0.962
Player Recall 0.949 0.908
Player F1 0.950 0.934
Referee Precision 0.927 0.953
Referee Recall 0.930 0.794
Referee F1 0.929 0.867

Code Examples

YOLOv8n Inference

from ultralytics import YOLO

yolo_model = YOLO("model/BODD_yolov8n_0001.pt")

yolo_results = yolo_model.predict(
    source="data/yolo/test/images",
    imgsz=704,
    device="cuda",
    conf=0.25,
    iou=0.5
)

RF-DETR Nano Inference

from rfdetr import RFDETRNano
from PIL import Image

rfdetr_model = RFDETRNano(
    pretrain_weights="model/BODD_rf-detr-nano_0000/checkpoint_best_total.pth"
)

img = Image.open("path/to/image.jpg").convert("RGB")

detections = rfdetr_model.predict(
    img,
    resolution=704,
    conf_threshold=0.25
)

Full Evaluation Script

Look at evaluation folder https://github.com/GabrieleGiudic/E-BARD/detection/eval/yolo_vs_detr.py

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GabrieleGiudici/E-BARD-detection-models

Base model

Ultralytics/YOLOv8
Finetuned
(136)
this model

Dataset used to train GabrieleGiudici/E-BARD-detection-models