Spaces:
Sleeping
Sleeping
| import os | |
| import google.generativeai as genai | |
| import time | |
| def extract_frame_timestamp(video_path): | |
| # Set your API key | |
| genai.configure(api_key=os.getenv("GOOGLE_API_KEY")) | |
| # Load the model (Gemini 1.5 Flash is currently accessible via the multimodal endpoint) | |
| model = genai.GenerativeModel(model_name="models/gemini-2.5-flash") | |
| # Upload video as a part of a multi-turn prompt | |
| upload_response = genai.upload_file(path=video_path, mime_type="video/mp4") | |
| file_id = upload_response.name | |
| while True: | |
| status = genai.get_file(file_id) | |
| if status.state == 2: | |
| break | |
| print(f"Waiting for file to become ACTIVE... Current state: {status.state}") | |
| time.sleep(1) | |
| prompt = ''' | |
| **Role:** You are an expert sports analyst and video frame identification specialist. You are highly skilled at correlating audio commentary with visual events in sports footage. | |
| **Task:** | |
| Your primary goal is to analyze the provided race video and its accompanying audio commentary to pinpoint the *exact frame number (or timestamp if frame number is not directly accessible)* where the race is effectively won and the racer crosses the finish line. | |
| **Inputs:** | |
| 1. **Race Video:** A video will be provided along with the prompt. | |
| 2. **Audio Commentary:** The commentary is embedded in the audio track of the video. | |
| **Process & Reasoning (Think Step-by-Step):** | |
| 1. **Synchronize & Analyze:** Carefully watch the video and listen to/read the commentary, paying close attention to the synchronization between visual events and spoken words. | |
| 2. **Identify "Winning Cues" in Commentary:** Look for phrases in the commentary that indicate a decisive moment, such as: | |
| * Exclamations of victory (e.g., "He's done it!", "That's the win!", "She takes it!"). | |
| * Confirmation of the winner by name or team. | |
| * Analysis of a specific maneuver or moment that secured the lead irretrievably. | |
| * Statements indicating other competitors can no longer catch the leader. | |
| 3. **Identify "Winning Cues" in Visuals:** Simultaneously, observe the video for visual indicators of a win: | |
| * A racer crossing a clearly defined finish line significantly ahead of others. | |
| * A competitor making a critical, race-ending mistake (e.g., a crash that takes them out of contention when they were leading or close). | |
| * A racer achieving an insurmountable lead where pursuit is visibly futile. | |
| * Celebratory gestures from the winning racer or their team that are clearly in response to securing the win. | |
| 4. **Correlate Cues:** The most reliable identification will come from a strong correlation between commentary and visual evidence. | |
| * If commentary explicitly calls the win and the visual matches, this is a strong candidate. | |
| * If the visual win is unambiguous (e.g., crossing the finish line well ahead) but the commentary lags slightly, the visual moment of crossing the line is likely the winning frame. | |
| 5. **Determine the *Decisive* Moment:** The "winning frame" is the *earliest frame* where the outcome of the race becomes virtually certain and is acknowledged either visually or by commentary (ideally both). It's the frame where the race *officially* ends. | |
| 6. **Disambiguate:** If there are multiple potential moments, explain your reasoning for selecting the specific frame. | |
| **Output Requirements:** | |
| 1. **Winning Timestamp:** Provide the precise timestamp (e.g., "Timestamp: 00:02:35.120") where the race was decisively won. | |
| 2. **Justification (1-3 sentences):** Briefly explain *why* you selected this specific frame, referencing key visual and/or commentary cues that led to your decision. | |
| 3. **Short description of the winner** Briefly explain how the winner looks like, what is he dressed like, what is he riding if it's the case, what makes him stand-out in that frame and other things that are useful in identifying the winner in that frame | |
| **Example of how you might reason (for internal thought process, not necessarily full output):** | |
| * "At timestamp X, the commentator shouts 'And she wins it.' And the racer clearly passes the finish line with a commanding lead ahead of the other racers." Description: The winner is dressed in white and is riding a horse. In this frame he is present to the right of the image.* | |
| **Important Considerations:** | |
| * If the commentary is unclear or the visual evidence is ambiguous, state this and make your best judgment based on the available information. | |
| * Prioritize the earliest definitive moment of the win. | |
| ''' | |
| # Use the uploaded video in a prompt | |
| prompt_parts = [ | |
| prompt, | |
| upload_response # This is a File object | |
| ] | |
| response = model.generate_content(prompt_parts, generation_config={ | |
| "temperature": 0 | |
| }) | |
| return response.text , upload_response | |