File size: 4,847 Bytes
a54527e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import os
import google.generativeai as genai
import time


def extract_frame_timestamp(video_path):
    # Set your API key
    genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

    # Load the model (Gemini 1.5 Flash is currently accessible via the multimodal endpoint)
    model = genai.GenerativeModel(model_name="models/gemini-2.5-flash")

    # Upload video as a part of a multi-turn prompt
    upload_response = genai.upload_file(path=video_path, mime_type="video/mp4")
    
    file_id = upload_response.name
    while True:
        status = genai.get_file(file_id)
        if status.state == 2:
            break
        print(f"Waiting for file to become ACTIVE... Current state: {status.state}")
        time.sleep(1)

    prompt = '''
**Role:** You are an expert sports analyst and video frame identification specialist. You are highly skilled at correlating audio commentary with visual events in sports footage.

**Task:**
Your primary goal is to analyze the provided race video and its accompanying audio commentary to pinpoint the *exact frame number (or timestamp if frame number is not directly accessible)* where the race is effectively won and the racer crosses the finish line.

**Inputs:**
1.  **Race Video:** A video will be provided along with the prompt.
2.  **Audio Commentary:** The commentary is embedded in the audio track of the video.

**Process & Reasoning (Think Step-by-Step):**

1.  **Synchronize & Analyze:** Carefully watch the video and listen to/read the commentary, paying close attention to the synchronization between visual events and spoken words.
2.  **Identify "Winning Cues" in Commentary:** Look for phrases in the commentary that indicate a decisive moment, such as:
    *   Exclamations of victory (e.g., "He's done it!", "That's the win!", "She takes it!").
    *   Confirmation of the winner by name or team.
    *   Analysis of a specific maneuver or moment that secured the lead irretrievably.
    *   Statements indicating other competitors can no longer catch the leader.
3.  **Identify "Winning Cues" in Visuals:** Simultaneously, observe the video for visual indicators of a win:
    *   A racer crossing a clearly defined finish line significantly ahead of others.
    *   A competitor making a critical, race-ending mistake (e.g., a crash that takes them out of contention when they were leading or close).
    *   A racer achieving an insurmountable lead where pursuit is visibly futile.
    *   Celebratory gestures from the winning racer or their team that are clearly in response to securing the win.
4.  **Correlate Cues:** The most reliable identification will come from a strong correlation between commentary and visual evidence.
    *   If commentary explicitly calls the win and the visual matches, this is a strong candidate.
    *   If the visual win is unambiguous (e.g., crossing the finish line well ahead) but the commentary lags slightly, the visual moment of crossing the line is likely the winning frame.
5.  **Determine the *Decisive* Moment:** The "winning frame" is the *earliest frame* where the outcome of the race becomes virtually certain and is acknowledged either visually or by commentary (ideally both). It's the frame where the race *officially* ends.
6.  **Disambiguate:** If there are multiple potential moments, explain your reasoning for selecting the specific frame.

**Output Requirements:**

1.  **Winning Timestamp:** Provide the precise timestamp (e.g., "Timestamp: 00:02:35.120") where the race was decisively won.
2.  **Justification (1-3 sentences):** Briefly explain *why* you selected this specific frame, referencing key visual and/or commentary cues that led to your decision.
3. **Short description of the winner** Briefly explain how the winner looks like, what is he dressed like, what is he riding if it's the case, what makes him stand-out in that frame and other things that are useful in identifying the winner in that frame 

**Example of how you might reason (for internal thought process, not necessarily full output):**
*   "At timestamp X, the commentator shouts 'And she wins it.' And the racer clearly passes the finish line with a commanding lead ahead of the other racers." Description: The winner is dressed in white and is riding a horse. In this frame he is present to the right of the image.*

**Important Considerations:**
*   If the commentary is unclear or the visual evidence is ambiguous, state this and make your best judgment based on the available information.
*   Prioritize the earliest definitive moment of the win.
'''
    
    # Use the uploaded video in a prompt
    prompt_parts = [
        prompt,
        upload_response  # This is a File object
    ]

    response = model.generate_content(prompt_parts, generation_config={
        "temperature": 0
    })

    return response.text , upload_response