soccer-ball-detection / TESTING_GUIDE.md
eeeeeeeeeeeeee3's picture
Upload TESTING_GUIDE.md with huggingface_hub
0ec1ccb verified
# Testing & Validation Guide
This guide outlines how to test and validate that the improvements have fixed the issues and improved system performance.
## Quick Validation Checklist
### βœ… Phase 1: Critical Fixes Validation
#### 1.1 Test Class Indexing Fix
**Goal**: Verify mAP is no longer 0%
```bash
# Run a short training run (1-2 epochs) to check initial metrics
python scripts/train_detr.py \
--config configs/training.yaml \
--train-dir datasets/train \
--val-dir datasets/val \
--output-dir models
# Check validation output - should see:
# - Player mAP > 0 (was 0.00%)
# - Ball mAP > 0 (was 0.00%)
# - No "All Background" warnings
```
**Expected Results**:
- βœ… Player mAP@0.5 > 0.0 (should be > 0.10 after 1 epoch)
- βœ… Ball mAP@0.5 > 0.0 (should be > 0.05 after 1 epoch)
- βœ… No zero recall/precision for players
#### 1.2 Test Focal Loss vs Class Weights
**Goal**: Verify Focal Loss improves precision over 25x class weights
```bash
# Train with Focal Loss (current config)
python scripts/train_detr.py --config configs/training.yaml
# Monitor ball precision in MLflow/TensorBoard
# Should see: Ball Precision > 0.14% (previous was 0.14%)
```
**Expected Results**:
- βœ… Ball Precision@0.5 > 0.20 (improved from 0.14%)
- βœ… Ball Recall@0.5 > 0.50 (maintains or improves from 58%)
- βœ… Fewer false positives (lower avg predictions per image)
### βœ… Phase 2: Architecture Validation
#### 2.1 Test RF-DETR Integration
**Goal**: Verify RF-DETR can be loaded (full training requires RF-DETR's native API)
```python
# Quick test script
from src.training.model import get_detr_model
import yaml
config = yaml.safe_load(open('configs/training.yaml'))
config['model']['architecture'] = 'rfdetr'
try:
model = get_detr_model(config['model'], config['training'])
print("βœ… RF-DETR model loaded successfully")
except Exception as e:
print(f"⚠️ RF-DETR not available: {e}")
print("Note: Full RF-DETR training requires native API")
```
### βœ… Phase 3: Advanced Features Validation
#### 3.1 Test Copy-Paste Augmentation
**Goal**: Verify ball class balancing works
```python
# Test augmentation
from src.training.augmentation import CopyPasteAugmentation
from PIL import Image
import torch
# Create dummy ball patches
ball_patches = [(Image.new('RGB', (20, 20), 'white'), {})]
aug = CopyPasteAugmentation(prob=1.0, max_pastes=3)
aug.set_ball_patches(ball_patches)
# Test on sample image
img = Image.open('datasets/train/images/sample.jpg')
target = {
'boxes': torch.tensor([[100, 100, 150, 150]]),
'labels': torch.tensor([1]) # 1-based: player
}
aug_img, aug_target = aug(img, target)
print(f"Original boxes: {len(target['boxes'])}")
print(f"Augmented boxes: {len(aug_target['boxes'])}")
# Should have more boxes (pasted balls)
```
**Expected Results**:
- βœ… More ball annotations in training batches
- βœ… Improved ball recall during training
#### 3.2 Test SAHI Inference
**Goal**: Verify small ball detection improves
```python
# Test SAHI on validation image
from src.training.sahi_inference import sahi_predict
from PIL import Image
import torch
model = load_trained_model() # Your trained model
img = Image.open('datasets/val/images/sample.jpg')
# Standard inference
standard_preds = model([preprocess(img)])
# SAHI inference
sahi_preds = sahi_predict(model, img, slice_size=640, overlap_ratio=0.2)
print(f"Standard detections: {len(standard_preds['boxes'])}")
print(f"SAHI detections: {len(sahi_preds['boxes'])}")
# SAHI should detect more small balls
```
**Expected Results**:
- βœ… More ball detections with SAHI
- βœ… Better recall for small balls (< 20x20 pixels)
#### 3.3 Test ByteTrack Integration
**Goal**: Verify temporal tracking consistency
```python
# Test ByteTrack on video sequence
from src.tracker import ByteTrackerWrapper
import torch
tracker = ByteTrackerWrapper(frame_rate=30)
# Simulate detections across frames
for frame_idx in range(10):
detections = {
'boxes': torch.tensor([[100, 100, 120, 120]]),
'scores': torch.tensor([0.8]),
'labels': torch.tensor([1]) # ball
}
tracked = tracker.update(detections, (1080, 1920))
print(f"Frame {frame_idx}: {len(tracked)} tracks")
if tracked:
print(f" Track ID: {tracked[0]['track_id']}")
```
**Expected Results**:
- βœ… Consistent track IDs across frames
- βœ… Ball tracks persist even with low-confidence detections
#### 3.4 Test Homography/GSR
**Goal**: Verify pixel-to-pitch coordinate transformation
```python
# Test homography estimation
from src.analysis.homography import HomographyEstimator
import numpy as np
from PIL import Image
estimator = HomographyEstimator(pitch_width=105.0, pitch_height=68.0)
img = np.array(Image.open('datasets/val/images/sample.jpg'))
# Estimate homography (auto or manual)
success = estimator.estimate(img)
if success:
# Transform a point
pixel_point = (960, 540) # Center of 1920x1080 image
pitch_point = estimator.transform(pixel_point)
print(f"Pixel {pixel_point} -> Pitch {pitch_point}")
```
**Expected Results**:
- βœ… Homography matrix estimated successfully
- βœ… Points transform correctly to pitch coordinates
### βœ… Phase 4: Data Quality Validation
#### 4.1 Test CLAHE Enhancement
**Goal**: Verify contrast improvement for synthetic fog
```python
# Visual test
from src.training.augmentation import CLAHEAugmentation
from PIL import Image
aug = CLAHEAugmentation(clip_limit=2.0, tile_grid_size=(8, 8))
img = Image.open('datasets/train/images/sample.jpg')
target = {'boxes': torch.tensor([]), 'labels': torch.tensor([])}
enhanced_img, _ = aug(img, target)
enhanced_img.save('enhanced_sample.jpg')
# Compare visually - should see better contrast
```
#### 4.2 Test Motion Blur
**Goal**: Verify motion blur augmentation works
```python
# Test motion blur
from src.training.augmentation import MotionBlurAugmentation
aug = MotionBlurAugmentation(prob=1.0, max_kernel_size=15)
img = Image.open('datasets/train/images/sample.jpg')
target = {'boxes': torch.tensor([]), 'labels': torch.tensor([])}
blurred_img, _ = aug(img, target)
blurred_img.save('blurred_sample.jpg')
```
## Comprehensive Training Test
### Full Training Run with Monitoring
```bash
# 1. Install new dependencies
pip install -r requirements.txt
# 2. Start training with all improvements
python scripts/train_detr.py \
--config configs/training.yaml \
--train-dir datasets/train \
--val-dir datasets/val \
--output-dir models
# 3. Monitor in MLflow (recommended)
mlflow ui --backend-store-uri file:./mlruns
# Open http://localhost:5000
# 4. Or monitor in TensorBoard
tensorboard --logdir logs
# Open http://localhost:6006
```
### Key Metrics to Monitor
**Training Metrics** (should improve):
- Training loss: Should decrease smoothly
- Focal Loss component: Should focus on hard examples
- Learning rate: Should follow cosine schedule
**Validation Metrics** (critical improvements):
- **Player mAP@0.5**: Target > 0.85 (was 0.00%)
- **Player Recall@0.5**: Target > 0.95 (was 0.00%)
- **Ball mAP@0.5**: Target > 0.70 (was low)
- **Ball Precision@0.5**: Target > 0.70 (was 0.14%)
- **Ball Recall@0.5**: Target > 0.80 (was ~58%)
- **Ball Avg Predictions**: Should be ~1.0 per image (not excessive)
### Comparison: Before vs After
Create a comparison script:
```python
# scripts/compare_metrics.py
import json
# Load old metrics (from previous training)
with open('old_metrics.json') as f:
old_metrics = json.load(f)
# Load new metrics (from current training)
with open('new_metrics.json') as f:
new_metrics = json.load(f)
print("Metric Comparison:")
print(f"Player mAP: {old_metrics['player_map']:.4f} -> {new_metrics['player_map']:.4f}")
print(f"Ball Precision: {old_metrics['ball_precision']:.4f} -> {new_metrics['ball_precision']:.4f}")
print(f"Ball Recall: {old_metrics['ball_recall']:.4f} -> {new_metrics['ball_recall']:.4f}")
```
## Quick Diagnostic Script
Run this to verify all fixes are working:
```bash
# scripts/quick_validation.py
python -c "
from src.training.dataset import CocoDataset
from src.training.model import get_detr_model
import yaml
# Test 1: Dataset labels are 1-based
config = yaml.safe_load(open('configs/training.yaml'))
dataset = CocoDataset('datasets/train', transforms=None)
sample = dataset[0]
labels = sample[1]['labels']
print(f'βœ… Dataset labels: {labels.unique().tolist()} (should be [1, 2] for 1-based)')
# Test 2: Model can be created
model = get_detr_model(config['model'], config['training'])
print('βœ… Model created successfully')
# Test 3: Focal Loss config
focal_enabled = config['training']['focal_loss']['enabled']
print(f'βœ… Focal Loss enabled: {focal_enabled}')
# Test 4: Class weights disabled
weights_enabled = config['training']['class_weights']['enabled']
print(f'βœ… Class weights disabled: {not weights_enabled}')
print('\nπŸŽ‰ All critical fixes verified!')
"
```
## Expected Timeline
- **Epoch 1-5**: Should see mAP > 0 immediately (fixes indexing bug)
- **Epoch 10**: Ball precision should improve (Focal Loss working)
- **Epoch 20**: Copy-Paste should show improved ball recall
- **Epoch 50+**: Should approach target metrics
## Troubleshooting
If metrics don't improve:
1. **Still seeing 0% mAP?**
- Check dataset labels are 1-based: `dataset[0][1]['labels']`
- Verify model expects 1-based: Check `model.py` line 119
2. **Ball precision still low?**
- Verify Focal Loss is enabled in config
- Check Focal Loss is being applied (add debug prints)
3. **No improvement with Copy-Paste?**
- Verify ball patches are being extracted
- Check augmentation is enabled in config
4. **SAHI not working?**
- Verify image slicing is correct
- Check NMS is merging predictions properly
## Next Steps After Validation
Once improvements are confirmed:
1. **Fine-tune hyperparameters**: Adjust Focal Loss alpha/gamma
2. **Optimize augmentations**: Tune Copy-Paste probability
3. **Scale up training**: Increase epochs if metrics still improving
4. **Deploy improvements**: Use trained model for inference