# Testing & Validation Guide This guide outlines how to test and validate that the improvements have fixed the issues and improved system performance. ## Quick Validation Checklist ### ✅ Phase 1: Critical Fixes Validation #### 1.1 Test Class Indexing Fix **Goal**: Verify mAP is no longer 0% ```bash # Run a short training run (1-2 epochs) to check initial metrics python scripts/train_detr.py \ --config configs/training.yaml \ --train-dir datasets/train \ --val-dir datasets/val \ --output-dir models # Check validation output - should see: # - Player mAP > 0 (was 0.00%) # - Ball mAP > 0 (was 0.00%) # - No "All Background" warnings ``` **Expected Results**: - ✅ Player mAP@0.5 > 0.0 (should be > 0.10 after 1 epoch) - ✅ Ball mAP@0.5 > 0.0 (should be > 0.05 after 1 epoch) - ✅ No zero recall/precision for players #### 1.2 Test Focal Loss vs Class Weights **Goal**: Verify Focal Loss improves precision over 25x class weights ```bash # Train with Focal Loss (current config) python scripts/train_detr.py --config configs/training.yaml # Monitor ball precision in MLflow/TensorBoard # Should see: Ball Precision > 0.14% (previous was 0.14%) ``` **Expected Results**: - ✅ Ball Precision@0.5 > 0.20 (improved from 0.14%) - ✅ Ball Recall@0.5 > 0.50 (maintains or improves from 58%) - ✅ Fewer false positives (lower avg predictions per image) ### ✅ Phase 2: Architecture Validation #### 2.1 Test RF-DETR Integration **Goal**: Verify RF-DETR can be loaded (full training requires RF-DETR's native API) ```python # Quick test script from src.training.model import get_detr_model import yaml config = yaml.safe_load(open('configs/training.yaml')) config['model']['architecture'] = 'rfdetr' try: model = get_detr_model(config['model'], config['training']) print("✅ RF-DETR model loaded successfully") except Exception as e: print(f"⚠️ RF-DETR not available: {e}") print("Note: Full RF-DETR training requires native API") ``` ### ✅ Phase 3: Advanced Features Validation #### 3.1 Test Copy-Paste Augmentation **Goal**: Verify ball class balancing works ```python # Test augmentation from src.training.augmentation import CopyPasteAugmentation from PIL import Image import torch # Create dummy ball patches ball_patches = [(Image.new('RGB', (20, 20), 'white'), {})] aug = CopyPasteAugmentation(prob=1.0, max_pastes=3) aug.set_ball_patches(ball_patches) # Test on sample image img = Image.open('datasets/train/images/sample.jpg') target = { 'boxes': torch.tensor([[100, 100, 150, 150]]), 'labels': torch.tensor([1]) # 1-based: player } aug_img, aug_target = aug(img, target) print(f"Original boxes: {len(target['boxes'])}") print(f"Augmented boxes: {len(aug_target['boxes'])}") # Should have more boxes (pasted balls) ``` **Expected Results**: - ✅ More ball annotations in training batches - ✅ Improved ball recall during training #### 3.2 Test SAHI Inference **Goal**: Verify small ball detection improves ```python # Test SAHI on validation image from src.training.sahi_inference import sahi_predict from PIL import Image import torch model = load_trained_model() # Your trained model img = Image.open('datasets/val/images/sample.jpg') # Standard inference standard_preds = model([preprocess(img)]) # SAHI inference sahi_preds = sahi_predict(model, img, slice_size=640, overlap_ratio=0.2) print(f"Standard detections: {len(standard_preds['boxes'])}") print(f"SAHI detections: {len(sahi_preds['boxes'])}") # SAHI should detect more small balls ``` **Expected Results**: - ✅ More ball detections with SAHI - ✅ Better recall for small balls (< 20x20 pixels) #### 3.3 Test ByteTrack Integration **Goal**: Verify temporal tracking consistency ```python # Test ByteTrack on video sequence from src.tracker import ByteTrackerWrapper import torch tracker = ByteTrackerWrapper(frame_rate=30) # Simulate detections across frames for frame_idx in range(10): detections = { 'boxes': torch.tensor([[100, 100, 120, 120]]), 'scores': torch.tensor([0.8]), 'labels': torch.tensor([1]) # ball } tracked = tracker.update(detections, (1080, 1920)) print(f"Frame {frame_idx}: {len(tracked)} tracks") if tracked: print(f" Track ID: {tracked[0]['track_id']}") ``` **Expected Results**: - ✅ Consistent track IDs across frames - ✅ Ball tracks persist even with low-confidence detections #### 3.4 Test Homography/GSR **Goal**: Verify pixel-to-pitch coordinate transformation ```python # Test homography estimation from src.analysis.homography import HomographyEstimator import numpy as np from PIL import Image estimator = HomographyEstimator(pitch_width=105.0, pitch_height=68.0) img = np.array(Image.open('datasets/val/images/sample.jpg')) # Estimate homography (auto or manual) success = estimator.estimate(img) if success: # Transform a point pixel_point = (960, 540) # Center of 1920x1080 image pitch_point = estimator.transform(pixel_point) print(f"Pixel {pixel_point} -> Pitch {pitch_point}") ``` **Expected Results**: - ✅ Homography matrix estimated successfully - ✅ Points transform correctly to pitch coordinates ### ✅ Phase 4: Data Quality Validation #### 4.1 Test CLAHE Enhancement **Goal**: Verify contrast improvement for synthetic fog ```python # Visual test from src.training.augmentation import CLAHEAugmentation from PIL import Image aug = CLAHEAugmentation(clip_limit=2.0, tile_grid_size=(8, 8)) img = Image.open('datasets/train/images/sample.jpg') target = {'boxes': torch.tensor([]), 'labels': torch.tensor([])} enhanced_img, _ = aug(img, target) enhanced_img.save('enhanced_sample.jpg') # Compare visually - should see better contrast ``` #### 4.2 Test Motion Blur **Goal**: Verify motion blur augmentation works ```python # Test motion blur from src.training.augmentation import MotionBlurAugmentation aug = MotionBlurAugmentation(prob=1.0, max_kernel_size=15) img = Image.open('datasets/train/images/sample.jpg') target = {'boxes': torch.tensor([]), 'labels': torch.tensor([])} blurred_img, _ = aug(img, target) blurred_img.save('blurred_sample.jpg') ``` ## Comprehensive Training Test ### Full Training Run with Monitoring ```bash # 1. Install new dependencies pip install -r requirements.txt # 2. Start training with all improvements python scripts/train_detr.py \ --config configs/training.yaml \ --train-dir datasets/train \ --val-dir datasets/val \ --output-dir models # 3. Monitor in MLflow (recommended) mlflow ui --backend-store-uri file:./mlruns # Open http://localhost:5000 # 4. Or monitor in TensorBoard tensorboard --logdir logs # Open http://localhost:6006 ``` ### Key Metrics to Monitor **Training Metrics** (should improve): - Training loss: Should decrease smoothly - Focal Loss component: Should focus on hard examples - Learning rate: Should follow cosine schedule **Validation Metrics** (critical improvements): - **Player mAP@0.5**: Target > 0.85 (was 0.00%) - **Player Recall@0.5**: Target > 0.95 (was 0.00%) - **Ball mAP@0.5**: Target > 0.70 (was low) - **Ball Precision@0.5**: Target > 0.70 (was 0.14%) - **Ball Recall@0.5**: Target > 0.80 (was ~58%) - **Ball Avg Predictions**: Should be ~1.0 per image (not excessive) ### Comparison: Before vs After Create a comparison script: ```python # scripts/compare_metrics.py import json # Load old metrics (from previous training) with open('old_metrics.json') as f: old_metrics = json.load(f) # Load new metrics (from current training) with open('new_metrics.json') as f: new_metrics = json.load(f) print("Metric Comparison:") print(f"Player mAP: {old_metrics['player_map']:.4f} -> {new_metrics['player_map']:.4f}") print(f"Ball Precision: {old_metrics['ball_precision']:.4f} -> {new_metrics['ball_precision']:.4f}") print(f"Ball Recall: {old_metrics['ball_recall']:.4f} -> {new_metrics['ball_recall']:.4f}") ``` ## Quick Diagnostic Script Run this to verify all fixes are working: ```bash # scripts/quick_validation.py python -c " from src.training.dataset import CocoDataset from src.training.model import get_detr_model import yaml # Test 1: Dataset labels are 1-based config = yaml.safe_load(open('configs/training.yaml')) dataset = CocoDataset('datasets/train', transforms=None) sample = dataset[0] labels = sample[1]['labels'] print(f'✅ Dataset labels: {labels.unique().tolist()} (should be [1, 2] for 1-based)') # Test 2: Model can be created model = get_detr_model(config['model'], config['training']) print('✅ Model created successfully') # Test 3: Focal Loss config focal_enabled = config['training']['focal_loss']['enabled'] print(f'✅ Focal Loss enabled: {focal_enabled}') # Test 4: Class weights disabled weights_enabled = config['training']['class_weights']['enabled'] print(f'✅ Class weights disabled: {not weights_enabled}') print('\n🎉 All critical fixes verified!') " ``` ## Expected Timeline - **Epoch 1-5**: Should see mAP > 0 immediately (fixes indexing bug) - **Epoch 10**: Ball precision should improve (Focal Loss working) - **Epoch 20**: Copy-Paste should show improved ball recall - **Epoch 50+**: Should approach target metrics ## Troubleshooting If metrics don't improve: 1. **Still seeing 0% mAP?** - Check dataset labels are 1-based: `dataset[0][1]['labels']` - Verify model expects 1-based: Check `model.py` line 119 2. **Ball precision still low?** - Verify Focal Loss is enabled in config - Check Focal Loss is being applied (add debug prints) 3. **No improvement with Copy-Paste?** - Verify ball patches are being extracted - Check augmentation is enabled in config 4. **SAHI not working?** - Verify image slicing is correct - Check NMS is merging predictions properly ## Next Steps After Validation Once improvements are confirmed: 1. **Fine-tune hyperparameters**: Adjust Focal Loss alpha/gamma 2. **Optimize augmentations**: Tune Copy-Paste probability 3. **Scale up training**: Increase epochs if metrics still improving 4. **Deploy improvements**: Use trained model for inference