# Detection Analysis Report **Date:** January 7, 2026 **Method:** Fixed coordinates + template matching **Video:** OSU vs Tenn 12.21.24.mkv **Processing Time:** 3.4 minutes (~2.8x faster than v3 baseline's 9.6 minutes) --- ## Summary | Metric | Value | |--------|-------| | Total Detected (raw) | 182 | | After 1.0s filter | 181 | | V3 Baseline | 176 | | True Positives | 176 | | False Positives | 5 | | False Negatives | 0 | | **Recall** | **100.0%** | | Precision | 97.2% | | F1 Score | 98.6% | ### Key Achievements 1. **100% Recall** - All 176 baseline plays correctly detected 2. **Better than baseline** - 3 "false positives" are actually legitimate plays the v3 baseline missed: - Opening kickoff (2:01) - Second half kickoff (10:14) - False start penalty (52:34) 3. **~3x faster** - 3.4 minutes vs 9.6 minutes for v3 baseline 4. **XP/FG filter working** - Reduced FPs from 10 to 6 by requiring 1.0s minimum time at clock=40 --- ## Duration Filter Threshold Analysis The v3 baseline's shortest play is **3.9s** (a timeout). However, using 3.0s as the threshold causes false negatives because our special play detections have shorter durations (we only capture the 40→25 transition, not the full play duration). | Threshold | Plays | FP | FN | Recall | Precision | F1 | |-----------|-------|----|----|--------|-----------|-----| | **1.0s** | **181** | **5** | **0** | **100.0%** | **97.2%** | **98.6%** | | 1.5s | 178 | 4 | 2 | 98.9% | 97.8% | 98.3% | | 2.0s | 176 | 3 | 3 | 98.3% | 98.3% | 98.3% | | 3.0s | 175 | 2 | 3 | 98.3% | 98.9% | 98.6% | **Recommendation:** Use **1.0s threshold** for best recall while filtering weird clock noise. --- ## "False Positives" Analysis (5 after 1.0s filter) These plays were detected but don't match any baseline play within 5 seconds. | # | Timestamp | Duration | Verdict | Notes | |---|-----------|----------|---------|-------| | 1 | 2:01.92 (121.9s) | 6.3s | ✅ **VALID** | Opening kickoff - should be tracked | | 2 | 2:31.43 (151.4s) | 1.9s | ⚠️ Optional | Weird clock behavior | | 3 | 10:14.93 (614.9s) | 15.0s | ✅ **VALID** | Second half kickoff - should be tracked | | 4 | 52:34.00 (3154.0s) | 2.9s | ✅ **VALID** | False start penalty - should be tracked | | 5 | 140:14.54 (8414.5s) | 1.4s | ⚠️ Optional | Weird clock behavior | **Filtered by 1.0s threshold:** - 69:12.60 (4152.6s) - 0.9s duration - Weird clock behavior ✅ Correctly filtered ### Key Finding: New Method Finds MORE Plays! **3 of the "false positives" are actually legitimate plays that the v3 baseline missed:** - Opening kickoff (2:01) - Second half kickoff (10:14) - False start penalty (52:34) This means our template matching method is actually **better than the baseline** for total play coverage. --- ## Performance Comparison | Metric | V3 Baseline | Static Templates | Dynamic Templates | |--------|-------------|------------------|-------------------| | Processing Time | 9.6 min | 3.4 min | 4.2 min | | Plays Detected | 176 | 181 (filtered) | 181 (filtered) | | True Positives | 176 | 176 | 176 | | False Positives | 0 | 5 | 5 | | False Negatives | 0 | 0 | 0 | | Precision | 100% | 97.2% | 97.2% | | Recall | 100% | 100% | 100% | | F1 Score | - | 98.6% | 98.6% | | Speedup | 1.0x | 2.8x | 2.3x | | Template Coverage | N/A | 100% (prebuilt) | 92% (23/25) | ### Template Capture Modes **Static Templates:** Pre-built templates loaded from disk (fastest startup) - Uses templates previously captured and saved to `output/debug/digit_templates/` - 100% template coverage (all 25 templates available) - Best for repeated analysis of the same video **Dynamic Templates:** Templates built on-the-fly using OCR (default mode) - Uses OCR to label first 400 frames, then builds templates from samples - 92% template coverage (23/25 templates - missing 2 rare digits) - Adds ~10 seconds for template building phase - More robust for new videos with different fonts/styles --- ## Fixes Applied ### 1. XP/FG Minimum Time Filter (`play_state_machine.py`) **Problem:** Weird clock behavior (40→25 within 1 second) was being incorrectly detected as XP/FG completions. **Solution:** Added minimum time requirement (1.0s) at clock=40 before accepting 40→25 as an XP/FG completion. ```python min_time_at_40 = 1.0 # Must be at 40 for at least 1s to avoid weird clock false positives if min_time_at_40 <= time_at_40 <= max_time_for_rapid_transition and len(self._countdown_history) == 0: # This is a valid XP/FG completion ``` ### 2. Merge Plays Fix (`_merge_plays()` in `play_detector.py`) **Problem:** Same play detected by both state machine and clock reset detection. **Solution:** Added 5-second proximity threshold to deduplicate overlapping detections. ### 3. Duration Filter (1.0s threshold) **Problem:** Weird clock noise produces very short "plays" (< 1 second). **Solution:** Filter plays with duration < 1.0s. Note: Using 3.0s (the orchestrator default) would create false negatives because special plays have short durations in fixed coordinates mode. --- ## Known Limitations 1. **Timeout Detection:** Class B (timeout) detection doesn't work in fixed coordinates mode because timeout indicators aren't tracked. Timeouts are classified as "special" plays instead. 2. **Special Play Durations:** Without full timeout tracking, special plays have shorter durations than the baseline (we only capture the 40→25 transition). --- ## Timestamps for Video Inspection ### Legitimate Plays (missed by v3 baseline) ``` 2:01 - Opening kickoff 10:14 - Second half kickoff 52:34 - False start penalty ``` ### Filtered by 1.0s threshold ``` 69:12 - Weird clock (0.9s) ✅ Filtered ``` ### Remaining questionable detections ``` 2:31 - Weird clock (1.9s) - Optional 140:14 - Weird clock (1.4s) - Optional ``` --- ## Timing Breakdown (Dynamic Template Mode) | Phase | Time | % of Total | |-------|------|------------| | Video I/O | 169.0s | 67.4% | | Template Building | 9.8s | 3.9% | | Template Matching | 71.3s | 28.4% | | Other (scorebug, state machine) | 0.5s | 0.2% | | **TOTAL** | **250.7s** | **100%** | The overhead of dynamic template capture (~10 seconds) is minimal compared to the total processing time. The majority of time is spent on video I/O (67%) and template matching (28%). --- ## Next Steps 1. ✅ **1.0s duration filter** - Implemented in test script 2. ✅ **Dynamic template capture** - Now the default behavior 3. **Update baseline** with the 3 legitimate plays found 4. **Integration with main.py:** Enable template matching mode in orchestrator 5. **Timeout tracking:** Add timeout indicator detection for proper Class B classification