GabrieleGiudici commited on
Commit
3f4789c
·
verified ·
1 Parent(s): 448848b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +260 -0
README.md ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ datasets:
4
+ - GabrieleGiudici/E-BARD-detection
5
+ base_model:
6
+ - Ultralytics/YOLOv8
7
+ - Roboflow/RFDETRNano
8
+ tags:
9
+ - basketball
10
+ - detection
11
+ ---
12
+ # Abstract
13
+ This work builds upon the Basketball Action Recognition Dataset (BARD), originally introduced to enable supervised learning for primary action recognition in NBA game footage. However, BARD's initial design lacks the granular annotations required to develop multi-stage computer vision pipelines involving object detection, jersey number recognition (JNR) and team attribution. To address these limitations, we present E-BARD (Extended Basketball Action Recognition Dataset), which bridges the gap between isolated action recognition and end-to-end scene-level reasoning through three key contributions.First, we introduce a new set of interrelated datasets that augment the original BARD videos with dense visual annotations. This includes detection data for key entities (ball, hoop, referee, player), team attribution based on uniform colors and JNR, all integrated to directly support and enrich the original action captions. Second, we establish a comprehensive benchmark for these specific visual understanding tasks using representative state-of-the-art models. We evaluate YOLO and RF-DETR for object detection; CLIP, SigLIP2, FashionCLIP, and the Perception Encoder for team color attribution; and olmOCR, Qwen2.5-VL-3B, and Qwen2.5-VL-7B for JNR. Finally, we propose a holistic, integrated approach based on Qwen2.5-VL, demonstrating the capacity of a unified multimodal framework to jointly address all subtasks simultaneously. Ultimately, E-BARD provides a comprehensive benchmark for multi-task basketball video understanding.
14
+
15
+ # Model Card for E-BARD Basketball Object Detection Models
16
+
17
+ This repository hosts two fine-tuned object detection models:
18
+
19
+ - **YOLOv8n**
20
+ - **RF-DETR Nano**
21
+
22
+ Both models are trained to detect key entities in basketball footage:
23
+
24
+ - Basketball
25
+ - Hoop
26
+ - Player
27
+ - Referee
28
+
29
+ These models were developed as part of the **E-BARD (Extended Basketball Action Recognition Dataset)** project to support **end-to-end basketball scene understanding pipelines**.
30
+
31
+ ---
32
+
33
+ # Model Details
34
+
35
+ **Developed by:** Gabriele Giudici (Author of E-BARD)
36
+
37
+ **Model Type:** Object Detection
38
+
39
+ ### YOLOv8n
40
+ - Lightweight CNN detector
41
+ - ~3.15M parameters
42
+
43
+ ### RF-DETR Nano
44
+ - Lightweight transformer-based detector
45
+ - ~30.5M parameters
46
+
47
+ **License:** CC-BY-4.0
48
+
49
+ **Finetuned from:**
50
+ - Base YOLOv8n
51
+ - Base RF-DETR Nano
52
+
53
+ ---
54
+
55
+ # Model Sources
56
+
57
+ **Code Repository**
58
+ https://github.com/GabrieleGiudic/E-BARD
59
+
60
+ **Original BARD Repository**
61
+ https://github.com/GabrieleGiudic/BARD
62
+
63
+ **Dataset Repository**
64
+ https://huggingface.co/datasets/GabrieleGiudici/E-BARD-detection
65
+
66
+ **Paper**
67
+ E-BARD: *A Multi-Task Extension of the Basketball Action Recognition Dataset for Player Detection, Team Attribution and Jersey Number Recognition.*
68
+
69
+ ---
70
+
71
+ # Uses
72
+
73
+ ## Direct Use
74
+
75
+ These models detect four basketball entities in a single frame:
76
+
77
+ - Basketball
78
+ - Basketball hoop
79
+ - Basketball player
80
+ - Referee
81
+
82
+ ## Downstream Use
83
+
84
+ Detections can be integrated into **sports analytics pipelines**, including:
85
+
86
+ - Multi-object tracking (e.g., ByteTrack)
87
+ - Jersey number recognition (JNR)
88
+ - Team color attribution
89
+ - Tactical analysis
90
+ - Event understanding
91
+
92
+ ---
93
+
94
+ # Bias, Risks, and Limitations
95
+
96
+ - Models were trained on **720p footage downscaled to 704×704**.
97
+ - Performance may degrade on **lower resolutions or different aspect ratios**.
98
+ - Dataset is derived from **2024–2025 NBA season footage**, potentially biasing the models toward:
99
+ - NBA court layouts
100
+ - broadcast camera angles
101
+ - lighting conditions
102
+ - uniform styles
103
+
104
+ Possible limitations:
105
+
106
+ - Reduced performance on **lower-tier leagues**
107
+ - Reduced performance on **street basketball environments**
108
+
109
+ ### Model-specific limitations
110
+
111
+ **YOLOv8n**
112
+
113
+ - Struggles with very small objects like the basketball
114
+ - Recall@50: **0.566**
115
+
116
+ **RF-DETR Nano**
117
+
118
+ - Conservative detection behavior
119
+ - Prioritizes precision over recall
120
+
121
+ ---
122
+
123
+ # Training Details
124
+
125
+ ## Training Data
126
+
127
+ The models were trained on the **E-BARD Detection Dataset**, derived from **60 BARD full-game recordings**.
128
+
129
+ **Dataset statistics**
130
+
131
+ * Total Frames: **1,800**
132
+ * Frames per game: **30**
133
+ * Total Annotations: **22,210**
134
+
135
+ **Class Distribution**
136
+
137
+ | Class | Instances |
138
+ | ----------- | --------- |
139
+ | Players | 15,296 |
140
+ | Referees | 3,853 |
141
+ | Hoops | 1,565 |
142
+ | Basketballs | 1,496 |
143
+
144
+ **Dataset split**
145
+
146
+ * Training: 80%
147
+ * Validation: 10%
148
+ * Test: 10%
149
+
150
+ ---
151
+
152
+ # Training Procedure
153
+
154
+ Both models were trained using:
155
+
156
+ * **Mixed precision (AMP)**
157
+ * **Early stopping**
158
+
159
+ ## YOLOv8n
160
+
161
+ * Epochs: 50
162
+ * Resolution: 704×704
163
+ * Batch Size: 64 (paper) / 32 (training script)
164
+ * Augmentations:
165
+
166
+ * Mosaic (1.0)
167
+ * Copy-Paste (0.5)
168
+ * RandAugment
169
+
170
+ ## RF-DETR Nano
171
+
172
+ * Epochs: 50
173
+ * Resolution: 704×704
174
+ * Batch Size: 16
175
+ * Learning Rate: 1e-4
176
+
177
+ ---
178
+
179
+ # Evaluation
180
+
181
+ ## Testing Data
182
+
183
+ Evaluation was performed on the **10% held-out test split** of E-BARD.
184
+
185
+ Metrics used:
186
+
187
+ * Precision
188
+ * Recall
189
+ * F1-score
190
+ * IoU threshold = **0.50**
191
+
192
+ ---
193
+
194
+ # Results
195
+
196
+ YOLOv8n consistently outperformed RF-DETR Nano across most classes.
197
+
198
+ ## Per-Class Performance (@ IoU 0.5)
199
+
200
+ | Class | Metric | YOLOv8n | RF-DETR Nano |
201
+ | ---------- | --------- | ------- | ------------ |
202
+ | Basketball | Precision | 0.811 | 0.845 |
203
+ | Basketball | Recall | 0.566 | 0.322 |
204
+ | Basketball | F1 | 0.667 | 0.467 |
205
+ | Hoop | Precision | 0.993 | 0.944 |
206
+ | Hoop | Recall | 0.937 | 0.742 |
207
+ | Hoop | F1 | 0.964 | 0.831 |
208
+ | Player | Precision | 0.952 | 0.962 |
209
+ | Player | Recall | 0.949 | 0.908 |
210
+ | Player | F1 | 0.950 | 0.934 |
211
+ | Referee | Precision | 0.927 | 0.953 |
212
+ | Referee | Recall | 0.930 | 0.794 |
213
+ | Referee | F1 | 0.929 | 0.867 |
214
+
215
+
216
+
217
+ # Code Examples
218
+
219
+ ## YOLOv8n Inference
220
+
221
+ ```python
222
+ from ultralytics import YOLO
223
+
224
+ yolo_model = YOLO("model/BODD_yolov8n_0001.pt")
225
+
226
+ yolo_results = yolo_model.predict(
227
+ source="data/yolo/test/images",
228
+ imgsz=704,
229
+ device="cuda",
230
+ conf=0.25,
231
+ iou=0.5
232
+ )
233
+ ```
234
+
235
+ ---
236
+
237
+ ## RF-DETR Nano Inference
238
+
239
+ ```python
240
+ from rfdetr import RFDETRNano
241
+ from PIL import Image
242
+
243
+ rfdetr_model = RFDETRNano(
244
+ pretrain_weights="model/BODD_rf-detr-nano_0000/checkpoint_best_total.pth"
245
+ )
246
+
247
+ img = Image.open("path/to/image.jpg").convert("RGB")
248
+
249
+ detections = rfdetr_model.predict(
250
+ img,
251
+ resolution=704,
252
+ conf_threshold=0.25
253
+ )
254
+ ```
255
+
256
+ ---
257
+
258
+ # Full Evaluation Script
259
+
260
+ Look at evaluation folder https://github.com/GabrieleGiudic/E-BARD/detection/eval/yolo_vs_detr.py