Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.8.0
Detection Analysis Report
Date: January 7, 2026
Method: Fixed coordinates + template matching
Video: OSU vs Tenn 12.21.24.mkv
Processing Time: 3.4 minutes (~2.8x faster than v3 baseline's 9.6 minutes)
Summary
| Metric | Value |
|---|---|
| Total Detected (raw) | 182 |
| After 1.0s filter | 181 |
| V3 Baseline | 176 |
| True Positives | 176 |
| False Positives | 5 |
| False Negatives | 0 |
| Recall | 100.0% |
| Precision | 97.2% |
| F1 Score | 98.6% |
Key Achievements
- 100% Recall - All 176 baseline plays correctly detected
- Better than baseline - 3 "false positives" are actually legitimate plays the v3 baseline missed:
- Opening kickoff (2:01)
- Second half kickoff (10:14)
- False start penalty (52:34)
- ~3x faster - 3.4 minutes vs 9.6 minutes for v3 baseline
- XP/FG filter working - Reduced FPs from 10 to 6 by requiring 1.0s minimum time at clock=40
Duration Filter Threshold Analysis
The v3 baseline's shortest play is 3.9s (a timeout). However, using 3.0s as the threshold causes false negatives because our special play detections have shorter durations (we only capture the 40β25 transition, not the full play duration).
| Threshold | Plays | FP | FN | Recall | Precision | F1 |
|---|---|---|---|---|---|---|
| 1.0s | 181 | 5 | 0 | 100.0% | 97.2% | 98.6% |
| 1.5s | 178 | 4 | 2 | 98.9% | 97.8% | 98.3% |
| 2.0s | 176 | 3 | 3 | 98.3% | 98.3% | 98.3% |
| 3.0s | 175 | 2 | 3 | 98.3% | 98.9% | 98.6% |
Recommendation: Use 1.0s threshold for best recall while filtering weird clock noise.
"False Positives" Analysis (5 after 1.0s filter)
These plays were detected but don't match any baseline play within 5 seconds.
| # | Timestamp | Duration | Verdict | Notes |
|---|---|---|---|---|
| 1 | 2:01.92 (121.9s) | 6.3s | β VALID | Opening kickoff - should be tracked |
| 2 | 2:31.43 (151.4s) | 1.9s | β οΈ Optional | Weird clock behavior |
| 3 | 10:14.93 (614.9s) | 15.0s | β VALID | Second half kickoff - should be tracked |
| 4 | 52:34.00 (3154.0s) | 2.9s | β VALID | False start penalty - should be tracked |
| 5 | 140:14.54 (8414.5s) | 1.4s | β οΈ Optional | Weird clock behavior |
Filtered by 1.0s threshold:
- 69:12.60 (4152.6s) - 0.9s duration - Weird clock behavior β Correctly filtered
Key Finding: New Method Finds MORE Plays!
3 of the "false positives" are actually legitimate plays that the v3 baseline missed:
- Opening kickoff (2:01)
- Second half kickoff (10:14)
- False start penalty (52:34)
This means our template matching method is actually better than the baseline for total play coverage.
Performance Comparison
| Metric | V3 Baseline | Static Templates | Dynamic Templates |
|---|---|---|---|
| Processing Time | 9.6 min | 3.4 min | 4.2 min |
| Plays Detected | 176 | 181 (filtered) | 181 (filtered) |
| True Positives | 176 | 176 | 176 |
| False Positives | 0 | 5 | 5 |
| False Negatives | 0 | 0 | 0 |
| Precision | 100% | 97.2% | 97.2% |
| Recall | 100% | 100% | 100% |
| F1 Score | - | 98.6% | 98.6% |
| Speedup | 1.0x | 2.8x | 2.3x |
| Template Coverage | N/A | 100% (prebuilt) | 92% (23/25) |
Template Capture Modes
Static Templates: Pre-built templates loaded from disk (fastest startup)
- Uses templates previously captured and saved to
output/debug/digit_templates/ - 100% template coverage (all 25 templates available)
- Best for repeated analysis of the same video
Dynamic Templates: Templates built on-the-fly using OCR (default mode)
- Uses OCR to label first 400 frames, then builds templates from samples
- 92% template coverage (23/25 templates - missing 2 rare digits)
- Adds ~10 seconds for template building phase
- More robust for new videos with different fonts/styles
Fixes Applied
1. XP/FG Minimum Time Filter (play_state_machine.py)
Problem: Weird clock behavior (40β25 within 1 second) was being incorrectly detected as XP/FG completions.
Solution: Added minimum time requirement (1.0s) at clock=40 before accepting 40β25 as an XP/FG completion.
min_time_at_40 = 1.0 # Must be at 40 for at least 1s to avoid weird clock false positives
if min_time_at_40 <= time_at_40 <= max_time_for_rapid_transition and len(self._countdown_history) == 0:
# This is a valid XP/FG completion
2. Merge Plays Fix (_merge_plays() in play_detector.py)
Problem: Same play detected by both state machine and clock reset detection.
Solution: Added 5-second proximity threshold to deduplicate overlapping detections.
3. Duration Filter (1.0s threshold)
Problem: Weird clock noise produces very short "plays" (< 1 second).
Solution: Filter plays with duration < 1.0s. Note: Using 3.0s (the orchestrator default) would create false negatives because special plays have short durations in fixed coordinates mode.
Known Limitations
Timeout Detection: Class B (timeout) detection doesn't work in fixed coordinates mode because timeout indicators aren't tracked. Timeouts are classified as "special" plays instead.
Special Play Durations: Without full timeout tracking, special plays have shorter durations than the baseline (we only capture the 40β25 transition).
Timestamps for Video Inspection
Legitimate Plays (missed by v3 baseline)
2:01 - Opening kickoff
10:14 - Second half kickoff
52:34 - False start penalty
Filtered by 1.0s threshold
69:12 - Weird clock (0.9s) β
Filtered
Remaining questionable detections
2:31 - Weird clock (1.9s) - Optional
140:14 - Weird clock (1.4s) - Optional
Timing Breakdown (Dynamic Template Mode)
| Phase | Time | % of Total |
|---|---|---|
| Video I/O | 169.0s | 67.4% |
| Template Building | 9.8s | 3.9% |
| Template Matching | 71.3s | 28.4% |
| Other (scorebug, state machine) | 0.5s | 0.2% |
| TOTAL | 250.7s | 100% |
The overhead of dynamic template capture (~10 seconds) is minimal compared to the total processing time. The majority of time is spent on video I/O (67%) and template matching (28%).
Next Steps
- β 1.0s duration filter - Implemented in test script
- β Dynamic template capture - Now the default behavior
- Update baseline with the 3 legitimate plays found
- Integration with main.py: Enable template matching mode in orchestrator
- Timeout tracking: Add timeout indicator detection for proper Class B classification