Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,260 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
datasets:
|
| 4 |
+
- GabrieleGiudici/E-BARD-detection
|
| 5 |
+
base_model:
|
| 6 |
+
- Ultralytics/YOLOv8
|
| 7 |
+
- Roboflow/RFDETRNano
|
| 8 |
+
tags:
|
| 9 |
+
- basketball
|
| 10 |
+
- detection
|
| 11 |
+
---
|
| 12 |
+
# Abstract
|
| 13 |
+
This work builds upon the Basketball Action Recognition Dataset (BARD), originally introduced to enable supervised learning for primary action recognition in NBA game footage. However, BARD's initial design lacks the granular annotations required to develop multi-stage computer vision pipelines involving object detection, jersey number recognition (JNR) and team attribution. To address these limitations, we present E-BARD (Extended Basketball Action Recognition Dataset), which bridges the gap between isolated action recognition and end-to-end scene-level reasoning through three key contributions.First, we introduce a new set of interrelated datasets that augment the original BARD videos with dense visual annotations. This includes detection data for key entities (ball, hoop, referee, player), team attribution based on uniform colors and JNR, all integrated to directly support and enrich the original action captions. Second, we establish a comprehensive benchmark for these specific visual understanding tasks using representative state-of-the-art models. We evaluate YOLO and RF-DETR for object detection; CLIP, SigLIP2, FashionCLIP, and the Perception Encoder for team color attribution; and olmOCR, Qwen2.5-VL-3B, and Qwen2.5-VL-7B for JNR. Finally, we propose a holistic, integrated approach based on Qwen2.5-VL, demonstrating the capacity of a unified multimodal framework to jointly address all subtasks simultaneously. Ultimately, E-BARD provides a comprehensive benchmark for multi-task basketball video understanding.
|
| 14 |
+
|
| 15 |
+
# Model Card for E-BARD Basketball Object Detection Models
|
| 16 |
+
|
| 17 |
+
This repository hosts two fine-tuned object detection models:
|
| 18 |
+
|
| 19 |
+
- **YOLOv8n**
|
| 20 |
+
- **RF-DETR Nano**
|
| 21 |
+
|
| 22 |
+
Both models are trained to detect key entities in basketball footage:
|
| 23 |
+
|
| 24 |
+
- Basketball
|
| 25 |
+
- Hoop
|
| 26 |
+
- Player
|
| 27 |
+
- Referee
|
| 28 |
+
|
| 29 |
+
These models were developed as part of the **E-BARD (Extended Basketball Action Recognition Dataset)** project to support **end-to-end basketball scene understanding pipelines**.
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
# Model Details
|
| 34 |
+
|
| 35 |
+
**Developed by:** Gabriele Giudici (Author of E-BARD)
|
| 36 |
+
|
| 37 |
+
**Model Type:** Object Detection
|
| 38 |
+
|
| 39 |
+
### YOLOv8n
|
| 40 |
+
- Lightweight CNN detector
|
| 41 |
+
- ~3.15M parameters
|
| 42 |
+
|
| 43 |
+
### RF-DETR Nano
|
| 44 |
+
- Lightweight transformer-based detector
|
| 45 |
+
- ~30.5M parameters
|
| 46 |
+
|
| 47 |
+
**License:** CC-BY-4.0
|
| 48 |
+
|
| 49 |
+
**Finetuned from:**
|
| 50 |
+
- Base YOLOv8n
|
| 51 |
+
- Base RF-DETR Nano
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
# Model Sources
|
| 56 |
+
|
| 57 |
+
**Code Repository**
|
| 58 |
+
https://github.com/GabrieleGiudic/E-BARD
|
| 59 |
+
|
| 60 |
+
**Original BARD Repository**
|
| 61 |
+
https://github.com/GabrieleGiudic/BARD
|
| 62 |
+
|
| 63 |
+
**Dataset Repository**
|
| 64 |
+
https://huggingface.co/datasets/GabrieleGiudici/E-BARD-detection
|
| 65 |
+
|
| 66 |
+
**Paper**
|
| 67 |
+
E-BARD: *A Multi-Task Extension of the Basketball Action Recognition Dataset for Player Detection, Team Attribution and Jersey Number Recognition.*
|
| 68 |
+
|
| 69 |
+
---
|
| 70 |
+
|
| 71 |
+
# Uses
|
| 72 |
+
|
| 73 |
+
## Direct Use
|
| 74 |
+
|
| 75 |
+
These models detect four basketball entities in a single frame:
|
| 76 |
+
|
| 77 |
+
- Basketball
|
| 78 |
+
- Basketball hoop
|
| 79 |
+
- Basketball player
|
| 80 |
+
- Referee
|
| 81 |
+
|
| 82 |
+
## Downstream Use
|
| 83 |
+
|
| 84 |
+
Detections can be integrated into **sports analytics pipelines**, including:
|
| 85 |
+
|
| 86 |
+
- Multi-object tracking (e.g., ByteTrack)
|
| 87 |
+
- Jersey number recognition (JNR)
|
| 88 |
+
- Team color attribution
|
| 89 |
+
- Tactical analysis
|
| 90 |
+
- Event understanding
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
# Bias, Risks, and Limitations
|
| 95 |
+
|
| 96 |
+
- Models were trained on **720p footage downscaled to 704×704**.
|
| 97 |
+
- Performance may degrade on **lower resolutions or different aspect ratios**.
|
| 98 |
+
- Dataset is derived from **2024–2025 NBA season footage**, potentially biasing the models toward:
|
| 99 |
+
- NBA court layouts
|
| 100 |
+
- broadcast camera angles
|
| 101 |
+
- lighting conditions
|
| 102 |
+
- uniform styles
|
| 103 |
+
|
| 104 |
+
Possible limitations:
|
| 105 |
+
|
| 106 |
+
- Reduced performance on **lower-tier leagues**
|
| 107 |
+
- Reduced performance on **street basketball environments**
|
| 108 |
+
|
| 109 |
+
### Model-specific limitations
|
| 110 |
+
|
| 111 |
+
**YOLOv8n**
|
| 112 |
+
|
| 113 |
+
- Struggles with very small objects like the basketball
|
| 114 |
+
- Recall@50: **0.566**
|
| 115 |
+
|
| 116 |
+
**RF-DETR Nano**
|
| 117 |
+
|
| 118 |
+
- Conservative detection behavior
|
| 119 |
+
- Prioritizes precision over recall
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
# Training Details
|
| 124 |
+
|
| 125 |
+
## Training Data
|
| 126 |
+
|
| 127 |
+
The models were trained on the **E-BARD Detection Dataset**, derived from **60 BARD full-game recordings**.
|
| 128 |
+
|
| 129 |
+
**Dataset statistics**
|
| 130 |
+
|
| 131 |
+
* Total Frames: **1,800**
|
| 132 |
+
* Frames per game: **30**
|
| 133 |
+
* Total Annotations: **22,210**
|
| 134 |
+
|
| 135 |
+
**Class Distribution**
|
| 136 |
+
|
| 137 |
+
| Class | Instances |
|
| 138 |
+
| ----------- | --------- |
|
| 139 |
+
| Players | 15,296 |
|
| 140 |
+
| Referees | 3,853 |
|
| 141 |
+
| Hoops | 1,565 |
|
| 142 |
+
| Basketballs | 1,496 |
|
| 143 |
+
|
| 144 |
+
**Dataset split**
|
| 145 |
+
|
| 146 |
+
* Training: 80%
|
| 147 |
+
* Validation: 10%
|
| 148 |
+
* Test: 10%
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
# Training Procedure
|
| 153 |
+
|
| 154 |
+
Both models were trained using:
|
| 155 |
+
|
| 156 |
+
* **Mixed precision (AMP)**
|
| 157 |
+
* **Early stopping**
|
| 158 |
+
|
| 159 |
+
## YOLOv8n
|
| 160 |
+
|
| 161 |
+
* Epochs: 50
|
| 162 |
+
* Resolution: 704×704
|
| 163 |
+
* Batch Size: 64 (paper) / 32 (training script)
|
| 164 |
+
* Augmentations:
|
| 165 |
+
|
| 166 |
+
* Mosaic (1.0)
|
| 167 |
+
* Copy-Paste (0.5)
|
| 168 |
+
* RandAugment
|
| 169 |
+
|
| 170 |
+
## RF-DETR Nano
|
| 171 |
+
|
| 172 |
+
* Epochs: 50
|
| 173 |
+
* Resolution: 704×704
|
| 174 |
+
* Batch Size: 16
|
| 175 |
+
* Learning Rate: 1e-4
|
| 176 |
+
|
| 177 |
+
---
|
| 178 |
+
|
| 179 |
+
# Evaluation
|
| 180 |
+
|
| 181 |
+
## Testing Data
|
| 182 |
+
|
| 183 |
+
Evaluation was performed on the **10% held-out test split** of E-BARD.
|
| 184 |
+
|
| 185 |
+
Metrics used:
|
| 186 |
+
|
| 187 |
+
* Precision
|
| 188 |
+
* Recall
|
| 189 |
+
* F1-score
|
| 190 |
+
* IoU threshold = **0.50**
|
| 191 |
+
|
| 192 |
+
---
|
| 193 |
+
|
| 194 |
+
# Results
|
| 195 |
+
|
| 196 |
+
YOLOv8n consistently outperformed RF-DETR Nano across most classes.
|
| 197 |
+
|
| 198 |
+
## Per-Class Performance (@ IoU 0.5)
|
| 199 |
+
|
| 200 |
+
| Class | Metric | YOLOv8n | RF-DETR Nano |
|
| 201 |
+
| ---------- | --------- | ------- | ------------ |
|
| 202 |
+
| Basketball | Precision | 0.811 | 0.845 |
|
| 203 |
+
| Basketball | Recall | 0.566 | 0.322 |
|
| 204 |
+
| Basketball | F1 | 0.667 | 0.467 |
|
| 205 |
+
| Hoop | Precision | 0.993 | 0.944 |
|
| 206 |
+
| Hoop | Recall | 0.937 | 0.742 |
|
| 207 |
+
| Hoop | F1 | 0.964 | 0.831 |
|
| 208 |
+
| Player | Precision | 0.952 | 0.962 |
|
| 209 |
+
| Player | Recall | 0.949 | 0.908 |
|
| 210 |
+
| Player | F1 | 0.950 | 0.934 |
|
| 211 |
+
| Referee | Precision | 0.927 | 0.953 |
|
| 212 |
+
| Referee | Recall | 0.930 | 0.794 |
|
| 213 |
+
| Referee | F1 | 0.929 | 0.867 |
|
| 214 |
+
|
| 215 |
+
|
| 216 |
+
|
| 217 |
+
# Code Examples
|
| 218 |
+
|
| 219 |
+
## YOLOv8n Inference
|
| 220 |
+
|
| 221 |
+
```python
|
| 222 |
+
from ultralytics import YOLO
|
| 223 |
+
|
| 224 |
+
yolo_model = YOLO("model/BODD_yolov8n_0001.pt")
|
| 225 |
+
|
| 226 |
+
yolo_results = yolo_model.predict(
|
| 227 |
+
source="data/yolo/test/images",
|
| 228 |
+
imgsz=704,
|
| 229 |
+
device="cuda",
|
| 230 |
+
conf=0.25,
|
| 231 |
+
iou=0.5
|
| 232 |
+
)
|
| 233 |
+
```
|
| 234 |
+
|
| 235 |
+
---
|
| 236 |
+
|
| 237 |
+
## RF-DETR Nano Inference
|
| 238 |
+
|
| 239 |
+
```python
|
| 240 |
+
from rfdetr import RFDETRNano
|
| 241 |
+
from PIL import Image
|
| 242 |
+
|
| 243 |
+
rfdetr_model = RFDETRNano(
|
| 244 |
+
pretrain_weights="model/BODD_rf-detr-nano_0000/checkpoint_best_total.pth"
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
+
img = Image.open("path/to/image.jpg").convert("RGB")
|
| 248 |
+
|
| 249 |
+
detections = rfdetr_model.predict(
|
| 250 |
+
img,
|
| 251 |
+
resolution=704,
|
| 252 |
+
conf_threshold=0.25
|
| 253 |
+
)
|
| 254 |
+
```
|
| 255 |
+
|
| 256 |
+
---
|
| 257 |
+
|
| 258 |
+
# Full Evaluation Script
|
| 259 |
+
|
| 260 |
+
Look at evaluation folder https://github.com/GabrieleGiudic/E-BARD/detection/eval/yolo_vs_detr.py
|