cfb40 / docs /timeout_ground_truth.md
andytaylor-smg's picture
timeout works now
eecfaf7

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

Timeout Ground Truth

Video: OSU vs Tenn 12.21.24.mkv

Timeout Events (Chronological)

Timestamp Seconds Team Notes
4:25 265 HOME First timeout of the game
1:07:30 4050 AWAY
1:09:40 4180 AWAY
1:14:07 4447 HOME
1:16:06 4566 HOME
1:17:32 - - Halftime - All timeouts hidden
1:20:53 - - Scorebug reappears, timeouts reset to 3 each
1:44:54 6294 AWAY
2:22:30 - - Game Over - All timeouts hidden

Summary by Half

First Half:

  • HOME: 3 timeouts used (4:25, 1:14:07, 1:16:06)
  • AWAY: 2 timeouts used (1:07:30, 1:09:40)

Second Half:

  • AWAY: 1 timeout used (1:44:54)

Total Timeouts to Detect: 6


v4 Baseline Timeout Tracker Performance

Detected Timeouts (17 total from v4_baseline.json)

Play # Timestamp Seconds Team Ground Truth Match
4 4:26 266 HOME βœ“ Matches 4:25
10 9:19 559 HOME βœ— False positive
15 12:58 778 HOME βœ— False positive
21 17:37 1057 HOME βœ— False positive
35 29:05 1745 HOME βœ— False positive
60 48:21 2901 HOME βœ— False positive
68 55:04 3304 HOME βœ— False positive
79 1:02:24 3744 HOME βœ— False positive
91 1:11:54 4314 HOME βœ— False positive
102 1:23:48 5028 AWAY βœ— False positive
104 1:25:40 5140 HOME βœ— False positive
111 1:30:35 5435 HOME βœ— False positive
131 1:44:48 6288 AWAY βœ“ Matches 1:44:54
146 1:57:54 7074 HOME βœ— False positive
151 2:01:50 7310 HOME βœ— False positive
155 2:04:52 7492 HOME βœ— False positive
175 2:18:51 8331 AWAY βœ— False positive

Ground Truth Comparison

Ground Truth Detected? Notes
4:25 (HOME) βœ“ Play 4 at 4:26 1 second off
1:07:30 (AWAY) βœ— MISSED No detection near this time
1:09:40 (AWAY) βœ— MISSED No detection near this time
1:14:07 (HOME) βœ— MISSED No detection near this time
1:16:06 (HOME) βœ— MISSED No detection near this time
1:44:54 (AWAY) βœ“ Play 131 at 1:44:48 6 seconds off

Performance Summary

  • True Positives: 2 (detected correctly)
  • False Negatives: 4 (missed real timeouts)
  • False Positives: 15 (incorrectly flagged as timeout)
  • Recall: 2/6 = 33%
  • Precision: 2/17 = 12%

Key Observations

  1. Most first-half timeouts missed: 4 out of 5 first-half timeouts not detected
  2. High false positive rate: 15 false positives, mostly flagged as HOME
  3. Timeout indicator region may be misconfigured: The detector appears to trigger on 40->25 play clock transitions that aren't actual timeouts
  4. Second half better: The one second-half timeout (1:44:54) was detected

Root Cause Analysis

How Timeout Detection Works

  1. Trigger: Timeout detection is only triggered when a 40β†’25 play clock transition is detected
  2. Classification: When 40β†’25 occurs, classify_40_to_25_reset() is called which:
    • Compares current timeout counts with last known values via check_timeout_change()
    • If timeout_info.home_timeouts < last_home_timeouts β†’ HOME timeout
    • If timeout_info.away_timeouts < last_away_timeouts β†’ AWAY timeout
    • Otherwise, classified as "special play" (punt/FG/XP)

Why False Positives Occur

The DetectTimeouts class reads timeout indicators via bright pixel analysis in configured regions:

  • Config file: data/config/timeout_tracker_region.json
  • Home region: (1231, 972, 30, 49) - 30x49 pixel box
  • Away region: (661, 973, 31, 47) - 31x47 pixel box

Problems observed in isolation testing:

  • At 0:30 (start): Reads [False, False, False] for both teams (0 timeouts) - should be 3 each
  • At 4:30: Reads [True, True, True] for both teams (3 timeouts) - wildly inconsistent
  • Many "resets" and spurious transitions detected

Likely causes:

  1. Region coordinates may be slightly off or need adjustment for this video
  2. Brightness threshold (200) or ratio threshold (10%) may not be optimal
  3. The timeout indicator ovals may have a different appearance than expected

Updated Analysis (2026-01-09)

Test Methodology Improvement

The initial v4 baseline test was reading timeout indicators immediately at the 40β†’25 transition. However, the timeout indicator on the scorebug updates with a delay of 4-6 seconds after the clock resets.

Updated test methodology:

  1. Cached all 17,555 play clock readings at 2 fps
  2. Identified 55 total 40β†’25 transitions
  3. Compared timeout readings from 2s BEFORE to 2-6s AFTER each transition

Validation Rules Implemented

A valid timeout must satisfy:

  • Exactly one team decreases by exactly 1
  • Other team stays the same
  • Confidence threshold: Both before and after readings must have confidence >= 0.5

This filters out scorebug visibility issues where both teams' readings change:

  • #5 at 9:19: (2,3)β†’(1,1) - both changed β†’ REJECTED
  • #11 at 29:04: (2,3)β†’(0,0) - both changed β†’ REJECTED
  • #35 at 1:30:35: (3,3)β†’(3,1) - away changed by 2 β†’ REJECTED
  • #45 at 1:57:55: (3,2)β†’(0,2) - home changed by 3 β†’ REJECTED
  • #32 at 1:23:48: (3,3)β†’(3,2) with low confidence (0.487) β†’ REJECTED

βœ… IMPLEMENTED FIX (2026-01-09)

Final Results (Pipeline Implementation)

Metric Value
True Positives 6
False Positives 0
False Negatives 0
Recall 100%
Precision 100%

Detected Timeouts (All Correct)

Timestamp Seconds Team Ground Truth
4:24 265 HOME βœ“ Matches 4:25
67:23 4044 AWAY βœ“ Matches 67:30
69:38 4178 AWAY βœ“ Matches 69:40
74:04 4444 HOME βœ“ Matches 74:07
76:03 4563 HOME βœ“ Matches 76:06
104:45 6285 AWAY βœ“ Matches 104:54

Implementation Details

Key Changes Made:

  1. Delayed timeout check (TIMEOUT_CHECK_DELAY = 5.5s):

    • Store timeout reading when 40β†’25 transition detected
    • Schedule check 5.5 seconds later
    • Compare readings to detect change
  2. Validation logic:

    • Require exactly ONE team to decrease by exactly 1
    • Other team's count must stay the same
    • Both readings must have confidence >= 0.5
  3. Multiple entry points:

    • classify_40_to_25_reset() - handles 40β†’25 in PRE_SNAP state
    • check_possession_change() - handles 40β†’25 during PLAY_IN_PROGRESS state
    • Both now schedule delayed timeout checks
  4. Merger priority fix:

    • Changed from normal > special > timeout to normal > timeout > special
    • Timeout plays now have higher priority than special plays
  5. Quiet time filter fix:

    • Only filter "special" plays in quiet time after normal plays
    • Timeout plays can occur immediately after normal plays end

Files Modified:

  • src/tracking/play_identification_checks.py
  • src/tracking/state_handlers.py
  • src/tracking/play_state.py
  • src/tracking/models.py
  • src/tracking/play_merger.py
  • src/tracking/clock_reset_identifier.py

Key Learnings

  1. Scorebug timeout indicator delay: The indicator updates 4-6 seconds AFTER the play clock resets from 40β†’25, not immediately
  2. Confidence thresholds matter: Low-confidence readings can cause false positives
  3. Validation rules essential: Must verify exactly one team changes by exactly 1
  4. Multiple code paths: 40β†’25 transitions can occur in both PRE_SNAP and PLAY_IN_PROGRESS states