cfb40 / docs /archive /detection_analysis.md
andytaylor-smg's picture
some non-fixes, still not working perfectly
acdeab4

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

Detection Analysis Report

Date: January 7, 2026
Method: Fixed coordinates + template matching
Video: OSU vs Tenn 12.21.24.mkv
Processing Time: 3.4 minutes (~2.8x faster than v3 baseline's 9.6 minutes)


Summary

Metric Value
Total Detected (raw) 182
After 1.0s filter 181
V3 Baseline 176
True Positives 176
False Positives 5
False Negatives 0
Recall 100.0%
Precision 97.2%
F1 Score 98.6%

Key Achievements

  1. 100% Recall - All 176 baseline plays correctly detected
  2. Better than baseline - 3 "false positives" are actually legitimate plays the v3 baseline missed:
    • Opening kickoff (2:01)
    • Second half kickoff (10:14)
    • False start penalty (52:34)
  3. ~3x faster - 3.4 minutes vs 9.6 minutes for v3 baseline
  4. XP/FG filter working - Reduced FPs from 10 to 6 by requiring 1.0s minimum time at clock=40

Duration Filter Threshold Analysis

The v3 baseline's shortest play is 3.9s (a timeout). However, using 3.0s as the threshold causes false negatives because our special play detections have shorter durations (we only capture the 40β†’25 transition, not the full play duration).

Threshold Plays FP FN Recall Precision F1
1.0s 181 5 0 100.0% 97.2% 98.6%
1.5s 178 4 2 98.9% 97.8% 98.3%
2.0s 176 3 3 98.3% 98.3% 98.3%
3.0s 175 2 3 98.3% 98.9% 98.6%

Recommendation: Use 1.0s threshold for best recall while filtering weird clock noise.


"False Positives" Analysis (5 after 1.0s filter)

These plays were detected but don't match any baseline play within 5 seconds.

# Timestamp Duration Verdict Notes
1 2:01.92 (121.9s) 6.3s βœ… VALID Opening kickoff - should be tracked
2 2:31.43 (151.4s) 1.9s ⚠️ Optional Weird clock behavior
3 10:14.93 (614.9s) 15.0s βœ… VALID Second half kickoff - should be tracked
4 52:34.00 (3154.0s) 2.9s βœ… VALID False start penalty - should be tracked
5 140:14.54 (8414.5s) 1.4s ⚠️ Optional Weird clock behavior

Filtered by 1.0s threshold:

  • 69:12.60 (4152.6s) - 0.9s duration - Weird clock behavior βœ… Correctly filtered

Key Finding: New Method Finds MORE Plays!

3 of the "false positives" are actually legitimate plays that the v3 baseline missed:

  • Opening kickoff (2:01)
  • Second half kickoff (10:14)
  • False start penalty (52:34)

This means our template matching method is actually better than the baseline for total play coverage.


Performance Comparison

Metric V3 Baseline Static Templates Dynamic Templates
Processing Time 9.6 min 3.4 min 4.2 min
Plays Detected 176 181 (filtered) 181 (filtered)
True Positives 176 176 176
False Positives 0 5 5
False Negatives 0 0 0
Precision 100% 97.2% 97.2%
Recall 100% 100% 100%
F1 Score - 98.6% 98.6%
Speedup 1.0x 2.8x 2.3x
Template Coverage N/A 100% (prebuilt) 92% (23/25)

Template Capture Modes

Static Templates: Pre-built templates loaded from disk (fastest startup)

  • Uses templates previously captured and saved to output/debug/digit_templates/
  • 100% template coverage (all 25 templates available)
  • Best for repeated analysis of the same video

Dynamic Templates: Templates built on-the-fly using OCR (default mode)

  • Uses OCR to label first 400 frames, then builds templates from samples
  • 92% template coverage (23/25 templates - missing 2 rare digits)
  • Adds ~10 seconds for template building phase
  • More robust for new videos with different fonts/styles

Fixes Applied

1. XP/FG Minimum Time Filter (play_state_machine.py)

Problem: Weird clock behavior (40β†’25 within 1 second) was being incorrectly detected as XP/FG completions.

Solution: Added minimum time requirement (1.0s) at clock=40 before accepting 40β†’25 as an XP/FG completion.

min_time_at_40 = 1.0  # Must be at 40 for at least 1s to avoid weird clock false positives

if min_time_at_40 <= time_at_40 <= max_time_for_rapid_transition and len(self._countdown_history) == 0:
    # This is a valid XP/FG completion

2. Merge Plays Fix (_merge_plays() in play_detector.py)

Problem: Same play detected by both state machine and clock reset detection.

Solution: Added 5-second proximity threshold to deduplicate overlapping detections.

3. Duration Filter (1.0s threshold)

Problem: Weird clock noise produces very short "plays" (< 1 second).

Solution: Filter plays with duration < 1.0s. Note: Using 3.0s (the orchestrator default) would create false negatives because special plays have short durations in fixed coordinates mode.


Known Limitations

  1. Timeout Detection: Class B (timeout) detection doesn't work in fixed coordinates mode because timeout indicators aren't tracked. Timeouts are classified as "special" plays instead.

  2. Special Play Durations: Without full timeout tracking, special plays have shorter durations than the baseline (we only capture the 40β†’25 transition).


Timestamps for Video Inspection

Legitimate Plays (missed by v3 baseline)

2:01   - Opening kickoff
10:14  - Second half kickoff  
52:34  - False start penalty

Filtered by 1.0s threshold

69:12  - Weird clock (0.9s) βœ… Filtered

Remaining questionable detections

2:31   - Weird clock (1.9s) - Optional
140:14 - Weird clock (1.4s) - Optional

Timing Breakdown (Dynamic Template Mode)

Phase Time % of Total
Video I/O 169.0s 67.4%
Template Building 9.8s 3.9%
Template Matching 71.3s 28.4%
Other (scorebug, state machine) 0.5s 0.2%
TOTAL 250.7s 100%

The overhead of dynamic template capture (~10 seconds) is minimal compared to the total processing time. The majority of time is spent on video I/O (67%) and template matching (28%).


Next Steps

  1. βœ… 1.0s duration filter - Implemented in test script
  2. βœ… Dynamic template capture - Now the default behavior
  3. Update baseline with the 3 legitimate plays found
  4. Integration with main.py: Enable template matching mode in orchestrator
  5. Timeout tracking: Add timeout indicator detection for proper Class B classification