Spaces:
Sleeping
Sleeping
| # Detection Analysis Report | |
| **Date:** January 7, 2026 | |
| **Method:** Fixed coordinates + template matching | |
| **Video:** OSU vs Tenn 12.21.24.mkv | |
| **Processing Time:** 3.4 minutes (~2.8x faster than v3 baseline's 9.6 minutes) | |
| --- | |
| ## Summary | |
| | Metric | Value | | |
| |--------|-------| | |
| | Total Detected (raw) | 182 | | |
| | After 1.0s filter | 181 | | |
| | V3 Baseline | 176 | | |
| | True Positives | 176 | | |
| | False Positives | 5 | | |
| | False Negatives | 0 | | |
| | **Recall** | **100.0%** | | |
| | Precision | 97.2% | | |
| | F1 Score | 98.6% | | |
| ### Key Achievements | |
| 1. **100% Recall** - All 176 baseline plays correctly detected | |
| 2. **Better than baseline** - 3 "false positives" are actually legitimate plays the v3 baseline missed: | |
| - Opening kickoff (2:01) | |
| - Second half kickoff (10:14) | |
| - False start penalty (52:34) | |
| 3. **~3x faster** - 3.4 minutes vs 9.6 minutes for v3 baseline | |
| 4. **XP/FG filter working** - Reduced FPs from 10 to 6 by requiring 1.0s minimum time at clock=40 | |
| --- | |
| ## Duration Filter Threshold Analysis | |
| The v3 baseline's shortest play is **3.9s** (a timeout). However, using 3.0s as the threshold causes false negatives because our special play detections have shorter durations (we only capture the 40β25 transition, not the full play duration). | |
| | Threshold | Plays | FP | FN | Recall | Precision | F1 | | |
| |-----------|-------|----|----|--------|-----------|-----| | |
| | **1.0s** | **181** | **5** | **0** | **100.0%** | **97.2%** | **98.6%** | | |
| | 1.5s | 178 | 4 | 2 | 98.9% | 97.8% | 98.3% | | |
| | 2.0s | 176 | 3 | 3 | 98.3% | 98.3% | 98.3% | | |
| | 3.0s | 175 | 2 | 3 | 98.3% | 98.9% | 98.6% | | |
| **Recommendation:** Use **1.0s threshold** for best recall while filtering weird clock noise. | |
| --- | |
| ## "False Positives" Analysis (5 after 1.0s filter) | |
| These plays were detected but don't match any baseline play within 5 seconds. | |
| | # | Timestamp | Duration | Verdict | Notes | | |
| |---|-----------|----------|---------|-------| | |
| | 1 | 2:01.92 (121.9s) | 6.3s | β **VALID** | Opening kickoff - should be tracked | | |
| | 2 | 2:31.43 (151.4s) | 1.9s | β οΈ Optional | Weird clock behavior | | |
| | 3 | 10:14.93 (614.9s) | 15.0s | β **VALID** | Second half kickoff - should be tracked | | |
| | 4 | 52:34.00 (3154.0s) | 2.9s | β **VALID** | False start penalty - should be tracked | | |
| | 5 | 140:14.54 (8414.5s) | 1.4s | β οΈ Optional | Weird clock behavior | | |
| **Filtered by 1.0s threshold:** | |
| - 69:12.60 (4152.6s) - 0.9s duration - Weird clock behavior β Correctly filtered | |
| ### Key Finding: New Method Finds MORE Plays! | |
| **3 of the "false positives" are actually legitimate plays that the v3 baseline missed:** | |
| - Opening kickoff (2:01) | |
| - Second half kickoff (10:14) | |
| - False start penalty (52:34) | |
| This means our template matching method is actually **better than the baseline** for total play coverage. | |
| --- | |
| ## Performance Comparison | |
| | Metric | V3 Baseline | Static Templates | Dynamic Templates | | |
| |--------|-------------|------------------|-------------------| | |
| | Processing Time | 9.6 min | 3.4 min | 4.2 min | | |
| | Plays Detected | 176 | 181 (filtered) | 181 (filtered) | | |
| | True Positives | 176 | 176 | 176 | | |
| | False Positives | 0 | 5 | 5 | | |
| | False Negatives | 0 | 0 | 0 | | |
| | Precision | 100% | 97.2% | 97.2% | | |
| | Recall | 100% | 100% | 100% | | |
| | F1 Score | - | 98.6% | 98.6% | | |
| | Speedup | 1.0x | 2.8x | 2.3x | | |
| | Template Coverage | N/A | 100% (prebuilt) | 92% (23/25) | | |
| ### Template Capture Modes | |
| **Static Templates:** Pre-built templates loaded from disk (fastest startup) | |
| - Uses templates previously captured and saved to `output/debug/digit_templates/` | |
| - 100% template coverage (all 25 templates available) | |
| - Best for repeated analysis of the same video | |
| **Dynamic Templates:** Templates built on-the-fly using OCR (default mode) | |
| - Uses OCR to label first 400 frames, then builds templates from samples | |
| - 92% template coverage (23/25 templates - missing 2 rare digits) | |
| - Adds ~10 seconds for template building phase | |
| - More robust for new videos with different fonts/styles | |
| --- | |
| ## Fixes Applied | |
| ### 1. XP/FG Minimum Time Filter (`play_state_machine.py`) | |
| **Problem:** Weird clock behavior (40β25 within 1 second) was being incorrectly detected as XP/FG completions. | |
| **Solution:** Added minimum time requirement (1.0s) at clock=40 before accepting 40β25 as an XP/FG completion. | |
| ```python | |
| min_time_at_40 = 1.0 # Must be at 40 for at least 1s to avoid weird clock false positives | |
| if min_time_at_40 <= time_at_40 <= max_time_for_rapid_transition and len(self._countdown_history) == 0: | |
| # This is a valid XP/FG completion | |
| ``` | |
| ### 2. Merge Plays Fix (`_merge_plays()` in `play_detector.py`) | |
| **Problem:** Same play detected by both state machine and clock reset detection. | |
| **Solution:** Added 5-second proximity threshold to deduplicate overlapping detections. | |
| ### 3. Duration Filter (1.0s threshold) | |
| **Problem:** Weird clock noise produces very short "plays" (< 1 second). | |
| **Solution:** Filter plays with duration < 1.0s. Note: Using 3.0s (the orchestrator default) would create false negatives because special plays have short durations in fixed coordinates mode. | |
| --- | |
| ## Known Limitations | |
| 1. **Timeout Detection:** Class B (timeout) detection doesn't work in fixed coordinates mode because timeout indicators aren't tracked. Timeouts are classified as "special" plays instead. | |
| 2. **Special Play Durations:** Without full timeout tracking, special plays have shorter durations than the baseline (we only capture the 40β25 transition). | |
| --- | |
| ## Timestamps for Video Inspection | |
| ### Legitimate Plays (missed by v3 baseline) | |
| ``` | |
| 2:01 - Opening kickoff | |
| 10:14 - Second half kickoff | |
| 52:34 - False start penalty | |
| ``` | |
| ### Filtered by 1.0s threshold | |
| ``` | |
| 69:12 - Weird clock (0.9s) β Filtered | |
| ``` | |
| ### Remaining questionable detections | |
| ``` | |
| 2:31 - Weird clock (1.9s) - Optional | |
| 140:14 - Weird clock (1.4s) - Optional | |
| ``` | |
| --- | |
| ## Timing Breakdown (Dynamic Template Mode) | |
| | Phase | Time | % of Total | | |
| |-------|------|------------| | |
| | Video I/O | 169.0s | 67.4% | | |
| | Template Building | 9.8s | 3.9% | | |
| | Template Matching | 71.3s | 28.4% | | |
| | Other (scorebug, state machine) | 0.5s | 0.2% | | |
| | **TOTAL** | **250.7s** | **100%** | | |
| The overhead of dynamic template capture (~10 seconds) is minimal compared to the total processing time. The majority of time is spent on video I/O (67%) and template matching (28%). | |
| --- | |
| ## Next Steps | |
| 1. β **1.0s duration filter** - Implemented in test script | |
| 2. β **Dynamic template capture** - Now the default behavior | |
| 3. **Update baseline** with the 3 legitimate plays found | |
| 4. **Integration with main.py:** Enable template matching mode in orchestrator | |
| 5. **Timeout tracking:** Add timeout indicator detection for proper Class B classification | |