| # Testing & Validation Guide | |
| This guide outlines how to test and validate that the improvements have fixed the issues and improved system performance. | |
| ## Quick Validation Checklist | |
| ### β Phase 1: Critical Fixes Validation | |
| #### 1.1 Test Class Indexing Fix | |
| **Goal**: Verify mAP is no longer 0% | |
| ```bash | |
| # Run a short training run (1-2 epochs) to check initial metrics | |
| python scripts/train_detr.py \ | |
| --config configs/training.yaml \ | |
| --train-dir datasets/train \ | |
| --val-dir datasets/val \ | |
| --output-dir models | |
| # Check validation output - should see: | |
| # - Player mAP > 0 (was 0.00%) | |
| # - Ball mAP > 0 (was 0.00%) | |
| # - No "All Background" warnings | |
| ``` | |
| **Expected Results**: | |
| - β Player mAP@0.5 > 0.0 (should be > 0.10 after 1 epoch) | |
| - β Ball mAP@0.5 > 0.0 (should be > 0.05 after 1 epoch) | |
| - β No zero recall/precision for players | |
| #### 1.2 Test Focal Loss vs Class Weights | |
| **Goal**: Verify Focal Loss improves precision over 25x class weights | |
| ```bash | |
| # Train with Focal Loss (current config) | |
| python scripts/train_detr.py --config configs/training.yaml | |
| # Monitor ball precision in MLflow/TensorBoard | |
| # Should see: Ball Precision > 0.14% (previous was 0.14%) | |
| ``` | |
| **Expected Results**: | |
| - β Ball Precision@0.5 > 0.20 (improved from 0.14%) | |
| - β Ball Recall@0.5 > 0.50 (maintains or improves from 58%) | |
| - β Fewer false positives (lower avg predictions per image) | |
| ### β Phase 2: Architecture Validation | |
| #### 2.1 Test RF-DETR Integration | |
| **Goal**: Verify RF-DETR can be loaded (full training requires RF-DETR's native API) | |
| ```python | |
| # Quick test script | |
| from src.training.model import get_detr_model | |
| import yaml | |
| config = yaml.safe_load(open('configs/training.yaml')) | |
| config['model']['architecture'] = 'rfdetr' | |
| try: | |
| model = get_detr_model(config['model'], config['training']) | |
| print("β RF-DETR model loaded successfully") | |
| except Exception as e: | |
| print(f"β οΈ RF-DETR not available: {e}") | |
| print("Note: Full RF-DETR training requires native API") | |
| ``` | |
| ### β Phase 3: Advanced Features Validation | |
| #### 3.1 Test Copy-Paste Augmentation | |
| **Goal**: Verify ball class balancing works | |
| ```python | |
| # Test augmentation | |
| from src.training.augmentation import CopyPasteAugmentation | |
| from PIL import Image | |
| import torch | |
| # Create dummy ball patches | |
| ball_patches = [(Image.new('RGB', (20, 20), 'white'), {})] | |
| aug = CopyPasteAugmentation(prob=1.0, max_pastes=3) | |
| aug.set_ball_patches(ball_patches) | |
| # Test on sample image | |
| img = Image.open('datasets/train/images/sample.jpg') | |
| target = { | |
| 'boxes': torch.tensor([[100, 100, 150, 150]]), | |
| 'labels': torch.tensor([1]) # 1-based: player | |
| } | |
| aug_img, aug_target = aug(img, target) | |
| print(f"Original boxes: {len(target['boxes'])}") | |
| print(f"Augmented boxes: {len(aug_target['boxes'])}") | |
| # Should have more boxes (pasted balls) | |
| ``` | |
| **Expected Results**: | |
| - β More ball annotations in training batches | |
| - β Improved ball recall during training | |
| #### 3.2 Test SAHI Inference | |
| **Goal**: Verify small ball detection improves | |
| ```python | |
| # Test SAHI on validation image | |
| from src.training.sahi_inference import sahi_predict | |
| from PIL import Image | |
| import torch | |
| model = load_trained_model() # Your trained model | |
| img = Image.open('datasets/val/images/sample.jpg') | |
| # Standard inference | |
| standard_preds = model([preprocess(img)]) | |
| # SAHI inference | |
| sahi_preds = sahi_predict(model, img, slice_size=640, overlap_ratio=0.2) | |
| print(f"Standard detections: {len(standard_preds['boxes'])}") | |
| print(f"SAHI detections: {len(sahi_preds['boxes'])}") | |
| # SAHI should detect more small balls | |
| ``` | |
| **Expected Results**: | |
| - β More ball detections with SAHI | |
| - β Better recall for small balls (< 20x20 pixels) | |
| #### 3.3 Test ByteTrack Integration | |
| **Goal**: Verify temporal tracking consistency | |
| ```python | |
| # Test ByteTrack on video sequence | |
| from src.tracker import ByteTrackerWrapper | |
| import torch | |
| tracker = ByteTrackerWrapper(frame_rate=30) | |
| # Simulate detections across frames | |
| for frame_idx in range(10): | |
| detections = { | |
| 'boxes': torch.tensor([[100, 100, 120, 120]]), | |
| 'scores': torch.tensor([0.8]), | |
| 'labels': torch.tensor([1]) # ball | |
| } | |
| tracked = tracker.update(detections, (1080, 1920)) | |
| print(f"Frame {frame_idx}: {len(tracked)} tracks") | |
| if tracked: | |
| print(f" Track ID: {tracked[0]['track_id']}") | |
| ``` | |
| **Expected Results**: | |
| - β Consistent track IDs across frames | |
| - β Ball tracks persist even with low-confidence detections | |
| #### 3.4 Test Homography/GSR | |
| **Goal**: Verify pixel-to-pitch coordinate transformation | |
| ```python | |
| # Test homography estimation | |
| from src.analysis.homography import HomographyEstimator | |
| import numpy as np | |
| from PIL import Image | |
| estimator = HomographyEstimator(pitch_width=105.0, pitch_height=68.0) | |
| img = np.array(Image.open('datasets/val/images/sample.jpg')) | |
| # Estimate homography (auto or manual) | |
| success = estimator.estimate(img) | |
| if success: | |
| # Transform a point | |
| pixel_point = (960, 540) # Center of 1920x1080 image | |
| pitch_point = estimator.transform(pixel_point) | |
| print(f"Pixel {pixel_point} -> Pitch {pitch_point}") | |
| ``` | |
| **Expected Results**: | |
| - β Homography matrix estimated successfully | |
| - β Points transform correctly to pitch coordinates | |
| ### β Phase 4: Data Quality Validation | |
| #### 4.1 Test CLAHE Enhancement | |
| **Goal**: Verify contrast improvement for synthetic fog | |
| ```python | |
| # Visual test | |
| from src.training.augmentation import CLAHEAugmentation | |
| from PIL import Image | |
| aug = CLAHEAugmentation(clip_limit=2.0, tile_grid_size=(8, 8)) | |
| img = Image.open('datasets/train/images/sample.jpg') | |
| target = {'boxes': torch.tensor([]), 'labels': torch.tensor([])} | |
| enhanced_img, _ = aug(img, target) | |
| enhanced_img.save('enhanced_sample.jpg') | |
| # Compare visually - should see better contrast | |
| ``` | |
| #### 4.2 Test Motion Blur | |
| **Goal**: Verify motion blur augmentation works | |
| ```python | |
| # Test motion blur | |
| from src.training.augmentation import MotionBlurAugmentation | |
| aug = MotionBlurAugmentation(prob=1.0, max_kernel_size=15) | |
| img = Image.open('datasets/train/images/sample.jpg') | |
| target = {'boxes': torch.tensor([]), 'labels': torch.tensor([])} | |
| blurred_img, _ = aug(img, target) | |
| blurred_img.save('blurred_sample.jpg') | |
| ``` | |
| ## Comprehensive Training Test | |
| ### Full Training Run with Monitoring | |
| ```bash | |
| # 1. Install new dependencies | |
| pip install -r requirements.txt | |
| # 2. Start training with all improvements | |
| python scripts/train_detr.py \ | |
| --config configs/training.yaml \ | |
| --train-dir datasets/train \ | |
| --val-dir datasets/val \ | |
| --output-dir models | |
| # 3. Monitor in MLflow (recommended) | |
| mlflow ui --backend-store-uri file:./mlruns | |
| # Open http://localhost:5000 | |
| # 4. Or monitor in TensorBoard | |
| tensorboard --logdir logs | |
| # Open http://localhost:6006 | |
| ``` | |
| ### Key Metrics to Monitor | |
| **Training Metrics** (should improve): | |
| - Training loss: Should decrease smoothly | |
| - Focal Loss component: Should focus on hard examples | |
| - Learning rate: Should follow cosine schedule | |
| **Validation Metrics** (critical improvements): | |
| - **Player mAP@0.5**: Target > 0.85 (was 0.00%) | |
| - **Player Recall@0.5**: Target > 0.95 (was 0.00%) | |
| - **Ball mAP@0.5**: Target > 0.70 (was low) | |
| - **Ball Precision@0.5**: Target > 0.70 (was 0.14%) | |
| - **Ball Recall@0.5**: Target > 0.80 (was ~58%) | |
| - **Ball Avg Predictions**: Should be ~1.0 per image (not excessive) | |
| ### Comparison: Before vs After | |
| Create a comparison script: | |
| ```python | |
| # scripts/compare_metrics.py | |
| import json | |
| # Load old metrics (from previous training) | |
| with open('old_metrics.json') as f: | |
| old_metrics = json.load(f) | |
| # Load new metrics (from current training) | |
| with open('new_metrics.json') as f: | |
| new_metrics = json.load(f) | |
| print("Metric Comparison:") | |
| print(f"Player mAP: {old_metrics['player_map']:.4f} -> {new_metrics['player_map']:.4f}") | |
| print(f"Ball Precision: {old_metrics['ball_precision']:.4f} -> {new_metrics['ball_precision']:.4f}") | |
| print(f"Ball Recall: {old_metrics['ball_recall']:.4f} -> {new_metrics['ball_recall']:.4f}") | |
| ``` | |
| ## Quick Diagnostic Script | |
| Run this to verify all fixes are working: | |
| ```bash | |
| # scripts/quick_validation.py | |
| python -c " | |
| from src.training.dataset import CocoDataset | |
| from src.training.model import get_detr_model | |
| import yaml | |
| # Test 1: Dataset labels are 1-based | |
| config = yaml.safe_load(open('configs/training.yaml')) | |
| dataset = CocoDataset('datasets/train', transforms=None) | |
| sample = dataset[0] | |
| labels = sample[1]['labels'] | |
| print(f'β Dataset labels: {labels.unique().tolist()} (should be [1, 2] for 1-based)') | |
| # Test 2: Model can be created | |
| model = get_detr_model(config['model'], config['training']) | |
| print('β Model created successfully') | |
| # Test 3: Focal Loss config | |
| focal_enabled = config['training']['focal_loss']['enabled'] | |
| print(f'β Focal Loss enabled: {focal_enabled}') | |
| # Test 4: Class weights disabled | |
| weights_enabled = config['training']['class_weights']['enabled'] | |
| print(f'β Class weights disabled: {not weights_enabled}') | |
| print('\nπ All critical fixes verified!') | |
| " | |
| ``` | |
| ## Expected Timeline | |
| - **Epoch 1-5**: Should see mAP > 0 immediately (fixes indexing bug) | |
| - **Epoch 10**: Ball precision should improve (Focal Loss working) | |
| - **Epoch 20**: Copy-Paste should show improved ball recall | |
| - **Epoch 50+**: Should approach target metrics | |
| ## Troubleshooting | |
| If metrics don't improve: | |
| 1. **Still seeing 0% mAP?** | |
| - Check dataset labels are 1-based: `dataset[0][1]['labels']` | |
| - Verify model expects 1-based: Check `model.py` line 119 | |
| 2. **Ball precision still low?** | |
| - Verify Focal Loss is enabled in config | |
| - Check Focal Loss is being applied (add debug prints) | |
| 3. **No improvement with Copy-Paste?** | |
| - Verify ball patches are being extracted | |
| - Check augmentation is enabled in config | |
| 4. **SAHI not working?** | |
| - Verify image slicing is correct | |
| - Check NMS is merging predictions properly | |
| ## Next Steps After Validation | |
| Once improvements are confirmed: | |
| 1. **Fine-tune hyperparameters**: Adjust Focal Loss alpha/gamma | |
| 2. **Optimize augmentations**: Tune Copy-Paste probability | |
| 3. **Scale up training**: Increase epochs if metrics still improving | |
| 4. **Deploy improvements**: Use trained model for inference | |