Spaces:

andytaylor-smg
/

cfb40

Sleeping

App Files Files Community

cfb40 / docs /archive /detection_analysis.md

andytaylor-smg

some non-fixes, still not working perfectly

acdeab4 about 1 month ago

preview code

raw

history blame contribute delete

6.67 kB

	# Detection Analysis Report

	Date: January 7, 2026
	Method: Fixed coordinates + template matching
	Video: OSU vs Tenn 12.21.24.mkv
	Processing Time: 3.4 minutes (~2.8x faster than v3 baseline's 9.6 minutes)

	---

	## Summary

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Total Detected (raw) \| 182 \|
	\| After 1.0s filter \| 181 \|
	\| V3 Baseline \| 176 \|
	\| True Positives \| 176 \|
	\| False Positives \| 5 \|
	\| False Negatives \| 0 \|
	\| Recall \| 100.0% \|
	\| Precision \| 97.2% \|
	\| F1 Score \| 98.6% \|

	### Key Achievements

	1. 100% Recall - All 176 baseline plays correctly detected
	2. Better than baseline - 3 "false positives" are actually legitimate plays the v3 baseline missed:
	- Opening kickoff (2:01)
	- Second half kickoff (10:14)
	- False start penalty (52:34)
	3. ~3x faster - 3.4 minutes vs 9.6 minutes for v3 baseline
	4. XP/FG filter working - Reduced FPs from 10 to 6 by requiring 1.0s minimum time at clock=40

	---

	## Duration Filter Threshold Analysis

	The v3 baseline's shortest play is 3.9s (a timeout). However, using 3.0s as the threshold causes false negatives because our special play detections have shorter durations (we only capture the 40→25 transition, not the full play duration).

	\| Threshold \| Plays \| FP \| FN \| Recall \| Precision \| F1 \|
	\|-----------\|-------\|----\|----\|--------\|-----------\|-----\|
	\| 1.0s \| 181 \| 5 \| 0 \| 100.0% \| 97.2% \| 98.6% \|
	\| 1.5s \| 178 \| 4 \| 2 \| 98.9% \| 97.8% \| 98.3% \|
	\| 2.0s \| 176 \| 3 \| 3 \| 98.3% \| 98.3% \| 98.3% \|
	\| 3.0s \| 175 \| 2 \| 3 \| 98.3% \| 98.9% \| 98.6% \|

	Recommendation: Use 1.0s threshold for best recall while filtering weird clock noise.

	---

	## "False Positives" Analysis (5 after 1.0s filter)

	These plays were detected but don't match any baseline play within 5 seconds.

	\| # \| Timestamp \| Duration \| Verdict \| Notes \|
	\|---\|-----------\|----------\|---------\|-------\|
	\| 1 \| 2:01.92 (121.9s) \| 6.3s \| ✅ VALID \| Opening kickoff - should be tracked \|
	\| 2 \| 2:31.43 (151.4s) \| 1.9s \| ⚠️ Optional \| Weird clock behavior \|
	\| 3 \| 10:14.93 (614.9s) \| 15.0s \| ✅ VALID \| Second half kickoff - should be tracked \|
	\| 4 \| 52:34.00 (3154.0s) \| 2.9s \| ✅ VALID \| False start penalty - should be tracked \|
	\| 5 \| 140:14.54 (8414.5s) \| 1.4s \| ⚠️ Optional \| Weird clock behavior \|

	Filtered by 1.0s threshold:
	- 69:12.60 (4152.6s) - 0.9s duration - Weird clock behavior ✅ Correctly filtered

	### Key Finding: New Method Finds MORE Plays!

	3 of the "false positives" are actually legitimate plays that the v3 baseline missed:
	- Opening kickoff (2:01)
	- Second half kickoff (10:14)
	- False start penalty (52:34)

	This means our template matching method is actually better than the baseline for total play coverage.

	---

	## Performance Comparison

	\| Metric \| V3 Baseline \| Static Templates \| Dynamic Templates \|
	\|--------\|-------------\|------------------\|-------------------\|
	\| Processing Time \| 9.6 min \| 3.4 min \| 4.2 min \|
	\| Plays Detected \| 176 \| 181 (filtered) \| 181 (filtered) \|
	\| True Positives \| 176 \| 176 \| 176 \|
	\| False Positives \| 0 \| 5 \| 5 \|
	\| False Negatives \| 0 \| 0 \| 0 \|
	\| Precision \| 100% \| 97.2% \| 97.2% \|
	\| Recall \| 100% \| 100% \| 100% \|
	\| F1 Score \| - \| 98.6% \| 98.6% \|
	\| Speedup \| 1.0x \| 2.8x \| 2.3x \|
	\| Template Coverage \| N/A \| 100% (prebuilt) \| 92% (23/25) \|

	### Template Capture Modes

	Static Templates: Pre-built templates loaded from disk (fastest startup)
	- Uses templates previously captured and saved to `output/debug/digit_templates/`
	- 100% template coverage (all 25 templates available)
	- Best for repeated analysis of the same video

	Dynamic Templates: Templates built on-the-fly using OCR (default mode)
	- Uses OCR to label first 400 frames, then builds templates from samples
	- 92% template coverage (23/25 templates - missing 2 rare digits)
	- Adds ~10 seconds for template building phase
	- More robust for new videos with different fonts/styles

	---

	## Fixes Applied

	### 1. XP/FG Minimum Time Filter (`play_state_machine.py`)

	Problem: Weird clock behavior (40→25 within 1 second) was being incorrectly detected as XP/FG completions.

	Solution: Added minimum time requirement (1.0s) at clock=40 before accepting 40→25 as an XP/FG completion.

	```python
	min_time_at_40 = 1.0 # Must be at 40 for at least 1s to avoid weird clock false positives

	if min_time_at_40 <= time_at_40 <= max_time_for_rapid_transition and len(self._countdown_history) == 0:
	# This is a valid XP/FG completion
	```

	### 2. Merge Plays Fix (`_merge_plays()` in `play_detector.py`)

	Problem: Same play detected by both state machine and clock reset detection.

	Solution: Added 5-second proximity threshold to deduplicate overlapping detections.

	### 3. Duration Filter (1.0s threshold)

	Problem: Weird clock noise produces very short "plays" (< 1 second).

	Solution: Filter plays with duration < 1.0s. Note: Using 3.0s (the orchestrator default) would create false negatives because special plays have short durations in fixed coordinates mode.

	---

	## Known Limitations

	1. Timeout Detection: Class B (timeout) detection doesn't work in fixed coordinates mode because timeout indicators aren't tracked. Timeouts are classified as "special" plays instead.

	2. Special Play Durations: Without full timeout tracking, special plays have shorter durations than the baseline (we only capture the 40→25 transition).

	---

	## Timestamps for Video Inspection

	### Legitimate Plays (missed by v3 baseline)
	```
	2:01 - Opening kickoff
	10:14 - Second half kickoff
	52:34 - False start penalty
	```

	### Filtered by 1.0s threshold
	```
	69:12 - Weird clock (0.9s) ✅ Filtered
	```

	### Remaining questionable detections
	```
	2:31 - Weird clock (1.9s) - Optional
	140:14 - Weird clock (1.4s) - Optional
	```

	---

	## Timing Breakdown (Dynamic Template Mode)

	\| Phase \| Time \| % of Total \|
	\|-------\|------\|------------\|
	\| Video I/O \| 169.0s \| 67.4% \|
	\| Template Building \| 9.8s \| 3.9% \|
	\| Template Matching \| 71.3s \| 28.4% \|
	\| Other (scorebug, state machine) \| 0.5s \| 0.2% \|
	\| TOTAL \| 250.7s \| 100% \|

	The overhead of dynamic template capture (~10 seconds) is minimal compared to the total processing time. The majority of time is spent on video I/O (67%) and template matching (28%).

	---

	## Next Steps

	1. ✅ 1.0s duration filter - Implemented in test script
	2. ✅ Dynamic template capture - Now the default behavior
	3. Update baseline with the 3 legitimate plays found
	4. Integration with main.py: Enable template matching mode in orchestrator
	5. Timeout tracking: Add timeout indicator detection for proper Class B classification

	# Detection Analysis Report

	Date: January 7, 2026
	Method: Fixed coordinates + template matching
	Video: OSU vs Tenn 12.21.24.mkv
	Processing Time: 3.4 minutes (~2.8x faster than v3 baseline's 9.6 minutes)

	---

	## Summary

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Total Detected (raw) \| 182 \|
	\| After 1.0s filter \| 181 \|
	\| V3 Baseline \| 176 \|
	\| True Positives \| 176 \|
	\| False Positives \| 5 \|
	\| False Negatives \| 0 \|
	\| Recall \| 100.0% \|
	\| Precision \| 97.2% \|
	\| F1 Score \| 98.6% \|

	### Key Achievements

	1. 100% Recall - All 176 baseline plays correctly detected
	2. Better than baseline - 3 "false positives" are actually legitimate plays the v3 baseline missed:
	- Opening kickoff (2:01)
	- Second half kickoff (10:14)
	- False start penalty (52:34)
	3. ~3x faster - 3.4 minutes vs 9.6 minutes for v3 baseline
	4. XP/FG filter working - Reduced FPs from 10 to 6 by requiring 1.0s minimum time at clock=40

	---

	## Duration Filter Threshold Analysis

	The v3 baseline's shortest play is 3.9s (a timeout). However, using 3.0s as the threshold causes false negatives because our special play detections have shorter durations (we only capture the 40→25 transition, not the full play duration).

	\| Threshold \| Plays \| FP \| FN \| Recall \| Precision \| F1 \|
	\|-----------\|-------\|----\|----\|--------\|-----------\|-----\|
	\| 1.0s \| 181 \| 5 \| 0 \| 100.0% \| 97.2% \| 98.6% \|
	\| 1.5s \| 178 \| 4 \| 2 \| 98.9% \| 97.8% \| 98.3% \|
	\| 2.0s \| 176 \| 3 \| 3 \| 98.3% \| 98.3% \| 98.3% \|
	\| 3.0s \| 175 \| 2 \| 3 \| 98.3% \| 98.9% \| 98.6% \|

	Recommendation: Use 1.0s threshold for best recall while filtering weird clock noise.

	---

	## "False Positives" Analysis (5 after 1.0s filter)

	These plays were detected but don't match any baseline play within 5 seconds.

	\| # \| Timestamp \| Duration \| Verdict \| Notes \|
	\|---\|-----------\|----------\|---------\|-------\|
	\| 1 \| 2:01.92 (121.9s) \| 6.3s \| ✅ VALID \| Opening kickoff - should be tracked \|
	\| 2 \| 2:31.43 (151.4s) \| 1.9s \| ⚠️ Optional \| Weird clock behavior \|
	\| 3 \| 10:14.93 (614.9s) \| 15.0s \| ✅ VALID \| Second half kickoff - should be tracked \|
	\| 4 \| 52:34.00 (3154.0s) \| 2.9s \| ✅ VALID \| False start penalty - should be tracked \|
	\| 5 \| 140:14.54 (8414.5s) \| 1.4s \| ⚠️ Optional \| Weird clock behavior \|

	Filtered by 1.0s threshold:
	- 69:12.60 (4152.6s) - 0.9s duration - Weird clock behavior ✅ Correctly filtered

	### Key Finding: New Method Finds MORE Plays!

	3 of the "false positives" are actually legitimate plays that the v3 baseline missed:
	- Opening kickoff (2:01)
	- Second half kickoff (10:14)
	- False start penalty (52:34)

	This means our template matching method is actually better than the baseline for total play coverage.

	---

	## Performance Comparison

	\| Metric \| V3 Baseline \| Static Templates \| Dynamic Templates \|
	\|--------\|-------------\|------------------\|-------------------\|
	\| Processing Time \| 9.6 min \| 3.4 min \| 4.2 min \|
	\| Plays Detected \| 176 \| 181 (filtered) \| 181 (filtered) \|
	\| True Positives \| 176 \| 176 \| 176 \|
	\| False Positives \| 0 \| 5 \| 5 \|
	\| False Negatives \| 0 \| 0 \| 0 \|
	\| Precision \| 100% \| 97.2% \| 97.2% \|
	\| Recall \| 100% \| 100% \| 100% \|
	\| F1 Score \| - \| 98.6% \| 98.6% \|
	\| Speedup \| 1.0x \| 2.8x \| 2.3x \|
	\| Template Coverage \| N/A \| 100% (prebuilt) \| 92% (23/25) \|

	### Template Capture Modes

	Static Templates: Pre-built templates loaded from disk (fastest startup)
	- Uses templates previously captured and saved to `output/debug/digit_templates/`
	- 100% template coverage (all 25 templates available)
	- Best for repeated analysis of the same video

	Dynamic Templates: Templates built on-the-fly using OCR (default mode)
	- Uses OCR to label first 400 frames, then builds templates from samples
	- 92% template coverage (23/25 templates - missing 2 rare digits)
	- Adds ~10 seconds for template building phase
	- More robust for new videos with different fonts/styles

	---

	## Fixes Applied

	### 1. XP/FG Minimum Time Filter (`play_state_machine.py`)

	Problem: Weird clock behavior (40→25 within 1 second) was being incorrectly detected as XP/FG completions.

	Solution: Added minimum time requirement (1.0s) at clock=40 before accepting 40→25 as an XP/FG completion.

	```python
	min_time_at_40 = 1.0 # Must be at 40 for at least 1s to avoid weird clock false positives

	if min_time_at_40 <= time_at_40 <= max_time_for_rapid_transition and len(self._countdown_history) == 0:
	# This is a valid XP/FG completion
	```

	### 2. Merge Plays Fix (`_merge_plays()` in `play_detector.py`)

	Problem: Same play detected by both state machine and clock reset detection.

	Solution: Added 5-second proximity threshold to deduplicate overlapping detections.

	### 3. Duration Filter (1.0s threshold)

	Problem: Weird clock noise produces very short "plays" (< 1 second).

	Solution: Filter plays with duration < 1.0s. Note: Using 3.0s (the orchestrator default) would create false negatives because special plays have short durations in fixed coordinates mode.

	---

	## Known Limitations

	1. Timeout Detection: Class B (timeout) detection doesn't work in fixed coordinates mode because timeout indicators aren't tracked. Timeouts are classified as "special" plays instead.

	2. Special Play Durations: Without full timeout tracking, special plays have shorter durations than the baseline (we only capture the 40→25 transition).

	---

	## Timestamps for Video Inspection

	### Legitimate Plays (missed by v3 baseline)
	```
	2:01 - Opening kickoff
	10:14 - Second half kickoff
	52:34 - False start penalty
	```

	### Filtered by 1.0s threshold
	```
	69:12 - Weird clock (0.9s) ✅ Filtered
	```

	### Remaining questionable detections
	```
	2:31 - Weird clock (1.9s) - Optional
	140:14 - Weird clock (1.4s) - Optional
	```

	---

	## Timing Breakdown (Dynamic Template Mode)

	\| Phase \| Time \| % of Total \|
	\|-------\|------\|------------\|
	\| Video I/O \| 169.0s \| 67.4% \|
	\| Template Building \| 9.8s \| 3.9% \|
	\| Template Matching \| 71.3s \| 28.4% \|
	\| Other (scorebug, state machine) \| 0.5s \| 0.2% \|
	\| TOTAL \| 250.7s \| 100% \|

	The overhead of dynamic template capture (~10 seconds) is minimal compared to the total processing time. The majority of time is spent on video I/O (67%) and template matching (28%).

	---

	## Next Steps

	1. ✅ 1.0s duration filter - Implemented in test script
	2. ✅ Dynamic template capture - Now the default behavior
	3. Update baseline with the 3 legitimate plays found
	4. Integration with main.py: Enable template matching mode in orchestrator
	5. Timeout tracking: Add timeout indicator detection for proper Class B classification