cfb40 / docs /archive /detection_analysis.md
andytaylor-smg's picture
some non-fixes, still not working perfectly
acdeab4
# Detection Analysis Report
**Date:** January 7, 2026
**Method:** Fixed coordinates + template matching
**Video:** OSU vs Tenn 12.21.24.mkv
**Processing Time:** 3.4 minutes (~2.8x faster than v3 baseline's 9.6 minutes)
---
## Summary
| Metric | Value |
|--------|-------|
| Total Detected (raw) | 182 |
| After 1.0s filter | 181 |
| V3 Baseline | 176 |
| True Positives | 176 |
| False Positives | 5 |
| False Negatives | 0 |
| **Recall** | **100.0%** |
| Precision | 97.2% |
| F1 Score | 98.6% |
### Key Achievements
1. **100% Recall** - All 176 baseline plays correctly detected
2. **Better than baseline** - 3 "false positives" are actually legitimate plays the v3 baseline missed:
- Opening kickoff (2:01)
- Second half kickoff (10:14)
- False start penalty (52:34)
3. **~3x faster** - 3.4 minutes vs 9.6 minutes for v3 baseline
4. **XP/FG filter working** - Reduced FPs from 10 to 6 by requiring 1.0s minimum time at clock=40
---
## Duration Filter Threshold Analysis
The v3 baseline's shortest play is **3.9s** (a timeout). However, using 3.0s as the threshold causes false negatives because our special play detections have shorter durations (we only capture the 40β†’25 transition, not the full play duration).
| Threshold | Plays | FP | FN | Recall | Precision | F1 |
|-----------|-------|----|----|--------|-----------|-----|
| **1.0s** | **181** | **5** | **0** | **100.0%** | **97.2%** | **98.6%** |
| 1.5s | 178 | 4 | 2 | 98.9% | 97.8% | 98.3% |
| 2.0s | 176 | 3 | 3 | 98.3% | 98.3% | 98.3% |
| 3.0s | 175 | 2 | 3 | 98.3% | 98.9% | 98.6% |
**Recommendation:** Use **1.0s threshold** for best recall while filtering weird clock noise.
---
## "False Positives" Analysis (5 after 1.0s filter)
These plays were detected but don't match any baseline play within 5 seconds.
| # | Timestamp | Duration | Verdict | Notes |
|---|-----------|----------|---------|-------|
| 1 | 2:01.92 (121.9s) | 6.3s | βœ… **VALID** | Opening kickoff - should be tracked |
| 2 | 2:31.43 (151.4s) | 1.9s | ⚠️ Optional | Weird clock behavior |
| 3 | 10:14.93 (614.9s) | 15.0s | βœ… **VALID** | Second half kickoff - should be tracked |
| 4 | 52:34.00 (3154.0s) | 2.9s | βœ… **VALID** | False start penalty - should be tracked |
| 5 | 140:14.54 (8414.5s) | 1.4s | ⚠️ Optional | Weird clock behavior |
**Filtered by 1.0s threshold:**
- 69:12.60 (4152.6s) - 0.9s duration - Weird clock behavior βœ… Correctly filtered
### Key Finding: New Method Finds MORE Plays!
**3 of the "false positives" are actually legitimate plays that the v3 baseline missed:**
- Opening kickoff (2:01)
- Second half kickoff (10:14)
- False start penalty (52:34)
This means our template matching method is actually **better than the baseline** for total play coverage.
---
## Performance Comparison
| Metric | V3 Baseline | Static Templates | Dynamic Templates |
|--------|-------------|------------------|-------------------|
| Processing Time | 9.6 min | 3.4 min | 4.2 min |
| Plays Detected | 176 | 181 (filtered) | 181 (filtered) |
| True Positives | 176 | 176 | 176 |
| False Positives | 0 | 5 | 5 |
| False Negatives | 0 | 0 | 0 |
| Precision | 100% | 97.2% | 97.2% |
| Recall | 100% | 100% | 100% |
| F1 Score | - | 98.6% | 98.6% |
| Speedup | 1.0x | 2.8x | 2.3x |
| Template Coverage | N/A | 100% (prebuilt) | 92% (23/25) |
### Template Capture Modes
**Static Templates:** Pre-built templates loaded from disk (fastest startup)
- Uses templates previously captured and saved to `output/debug/digit_templates/`
- 100% template coverage (all 25 templates available)
- Best for repeated analysis of the same video
**Dynamic Templates:** Templates built on-the-fly using OCR (default mode)
- Uses OCR to label first 400 frames, then builds templates from samples
- 92% template coverage (23/25 templates - missing 2 rare digits)
- Adds ~10 seconds for template building phase
- More robust for new videos with different fonts/styles
---
## Fixes Applied
### 1. XP/FG Minimum Time Filter (`play_state_machine.py`)
**Problem:** Weird clock behavior (40β†’25 within 1 second) was being incorrectly detected as XP/FG completions.
**Solution:** Added minimum time requirement (1.0s) at clock=40 before accepting 40β†’25 as an XP/FG completion.
```python
min_time_at_40 = 1.0 # Must be at 40 for at least 1s to avoid weird clock false positives
if min_time_at_40 <= time_at_40 <= max_time_for_rapid_transition and len(self._countdown_history) == 0:
# This is a valid XP/FG completion
```
### 2. Merge Plays Fix (`_merge_plays()` in `play_detector.py`)
**Problem:** Same play detected by both state machine and clock reset detection.
**Solution:** Added 5-second proximity threshold to deduplicate overlapping detections.
### 3. Duration Filter (1.0s threshold)
**Problem:** Weird clock noise produces very short "plays" (< 1 second).
**Solution:** Filter plays with duration < 1.0s. Note: Using 3.0s (the orchestrator default) would create false negatives because special plays have short durations in fixed coordinates mode.
---
## Known Limitations
1. **Timeout Detection:** Class B (timeout) detection doesn't work in fixed coordinates mode because timeout indicators aren't tracked. Timeouts are classified as "special" plays instead.
2. **Special Play Durations:** Without full timeout tracking, special plays have shorter durations than the baseline (we only capture the 40β†’25 transition).
---
## Timestamps for Video Inspection
### Legitimate Plays (missed by v3 baseline)
```
2:01 - Opening kickoff
10:14 - Second half kickoff
52:34 - False start penalty
```
### Filtered by 1.0s threshold
```
69:12 - Weird clock (0.9s) βœ… Filtered
```
### Remaining questionable detections
```
2:31 - Weird clock (1.9s) - Optional
140:14 - Weird clock (1.4s) - Optional
```
---
## Timing Breakdown (Dynamic Template Mode)
| Phase | Time | % of Total |
|-------|------|------------|
| Video I/O | 169.0s | 67.4% |
| Template Building | 9.8s | 3.9% |
| Template Matching | 71.3s | 28.4% |
| Other (scorebug, state machine) | 0.5s | 0.2% |
| **TOTAL** | **250.7s** | **100%** |
The overhead of dynamic template capture (~10 seconds) is minimal compared to the total processing time. The majority of time is spent on video I/O (67%) and template matching (28%).
---
## Next Steps
1. βœ… **1.0s duration filter** - Implemented in test script
2. βœ… **Dynamic template capture** - Now the default behavior
3. **Update baseline** with the 3 legitimate plays found
4. **Integration with main.py:** Enable template matching mode in orchestrator
5. **Timeout tracking:** Add timeout indicator detection for proper Class B classification