import os import google.generativeai as genai import time def extract_frame_timestamp(video_path): # Set your API key genai.configure(api_key=os.getenv("GOOGLE_API_KEY")) # Load the model (Gemini 1.5 Flash is currently accessible via the multimodal endpoint) model = genai.GenerativeModel(model_name="models/gemini-2.5-flash") # Upload video as a part of a multi-turn prompt upload_response = genai.upload_file(path=video_path, mime_type="video/mp4") file_id = upload_response.name while True: status = genai.get_file(file_id) if status.state == 2: break print(f"Waiting for file to become ACTIVE... Current state: {status.state}") time.sleep(1) prompt = ''' **Role:** You are an expert sports analyst and video frame identification specialist. You are highly skilled at correlating audio commentary with visual events in sports footage. **Task:** Your primary goal is to analyze the provided race video and its accompanying audio commentary to pinpoint the *exact frame number (or timestamp if frame number is not directly accessible)* where the race is effectively won and the racer crosses the finish line. **Inputs:** 1. **Race Video:** A video will be provided along with the prompt. 2. **Audio Commentary:** The commentary is embedded in the audio track of the video. **Process & Reasoning (Think Step-by-Step):** 1. **Synchronize & Analyze:** Carefully watch the video and listen to/read the commentary, paying close attention to the synchronization between visual events and spoken words. 2. **Identify "Winning Cues" in Commentary:** Look for phrases in the commentary that indicate a decisive moment, such as: * Exclamations of victory (e.g., "He's done it!", "That's the win!", "She takes it!"). * Confirmation of the winner by name or team. * Analysis of a specific maneuver or moment that secured the lead irretrievably. * Statements indicating other competitors can no longer catch the leader. 3. **Identify "Winning Cues" in Visuals:** Simultaneously, observe the video for visual indicators of a win: * A racer crossing a clearly defined finish line significantly ahead of others. * A competitor making a critical, race-ending mistake (e.g., a crash that takes them out of contention when they were leading or close). * A racer achieving an insurmountable lead where pursuit is visibly futile. * Celebratory gestures from the winning racer or their team that are clearly in response to securing the win. 4. **Correlate Cues:** The most reliable identification will come from a strong correlation between commentary and visual evidence. * If commentary explicitly calls the win and the visual matches, this is a strong candidate. * If the visual win is unambiguous (e.g., crossing the finish line well ahead) but the commentary lags slightly, the visual moment of crossing the line is likely the winning frame. 5. **Determine the *Decisive* Moment:** The "winning frame" is the *earliest frame* where the outcome of the race becomes virtually certain and is acknowledged either visually or by commentary (ideally both). It's the frame where the race *officially* ends. 6. **Disambiguate:** If there are multiple potential moments, explain your reasoning for selecting the specific frame. **Output Requirements:** 1. **Winning Timestamp:** Provide the precise timestamp (e.g., "Timestamp: 00:02:35.120") where the race was decisively won. 2. **Justification (1-3 sentences):** Briefly explain *why* you selected this specific frame, referencing key visual and/or commentary cues that led to your decision. 3. **Short description of the winner** Briefly explain how the winner looks like, what is he dressed like, what is he riding if it's the case, what makes him stand-out in that frame and other things that are useful in identifying the winner in that frame **Example of how you might reason (for internal thought process, not necessarily full output):** * "At timestamp X, the commentator shouts 'And she wins it.' And the racer clearly passes the finish line with a commanding lead ahead of the other racers." Description: The winner is dressed in white and is riding a horse. In this frame he is present to the right of the image.* **Important Considerations:** * If the commentary is unclear or the visual evidence is ambiguous, state this and make your best judgment based on the available information. * Prioritize the earliest definitive moment of the win. ''' # Use the uploaded video in a prompt prompt_parts = [ prompt, upload_response # This is a File object ] response = model.generate_content(prompt_parts, generation_config={ "temperature": 0 }) return response.text , upload_response