RacingDemo / src /models /frame_extractor.py
Vlad Bastina
merge
10877f8
import os
import google.generativeai as genai
import time
def extract_frame_timestamp(video_path):
# Set your API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
# Load the model (Gemini 1.5 Flash is currently accessible via the multimodal endpoint)
model = genai.GenerativeModel(model_name="models/gemini-2.5-flash")
# Upload video as a part of a multi-turn prompt
upload_response = genai.upload_file(path=video_path, mime_type="video/mp4")
file_id = upload_response.name
while True:
status = genai.get_file(file_id)
if status.state == 2:
break
print(f"Waiting for file to become ACTIVE... Current state: {status.state}")
time.sleep(1)
prompt = '''
**Role:** You are an expert sports analyst and video frame identification specialist. You are highly skilled at correlating audio commentary with visual events in sports footage.
**Task:**
Your primary goal is to analyze the provided race video and its accompanying audio commentary to pinpoint the *exact frame number (or timestamp if frame number is not directly accessible)* where the race is effectively won and the racer crosses the finish line.
**Inputs:**
1. **Race Video:** A video will be provided along with the prompt.
2. **Audio Commentary:** The commentary is embedded in the audio track of the video.
**Process & Reasoning (Think Step-by-Step):**
1. **Synchronize & Analyze:** Carefully watch the video and listen to/read the commentary, paying close attention to the synchronization between visual events and spoken words.
2. **Identify "Winning Cues" in Commentary:** Look for phrases in the commentary that indicate a decisive moment, such as:
* Exclamations of victory (e.g., "He's done it!", "That's the win!", "She takes it!").
* Confirmation of the winner by name or team.
* Analysis of a specific maneuver or moment that secured the lead irretrievably.
* Statements indicating other competitors can no longer catch the leader.
3. **Identify "Winning Cues" in Visuals:** Simultaneously, observe the video for visual indicators of a win:
* A racer crossing a clearly defined finish line significantly ahead of others.
* A competitor making a critical, race-ending mistake (e.g., a crash that takes them out of contention when they were leading or close).
* A racer achieving an insurmountable lead where pursuit is visibly futile.
* Celebratory gestures from the winning racer or their team that are clearly in response to securing the win.
4. **Correlate Cues:** The most reliable identification will come from a strong correlation between commentary and visual evidence.
* If commentary explicitly calls the win and the visual matches, this is a strong candidate.
* If the visual win is unambiguous (e.g., crossing the finish line well ahead) but the commentary lags slightly, the visual moment of crossing the line is likely the winning frame.
5. **Determine the *Decisive* Moment:** The "winning frame" is the *earliest frame* where the outcome of the race becomes virtually certain and is acknowledged either visually or by commentary (ideally both). It's the frame where the race *officially* ends.
6. **Disambiguate:** If there are multiple potential moments, explain your reasoning for selecting the specific frame.
**Output Requirements:**
1. **Winning Timestamp:** Provide the precise timestamp (e.g., "Timestamp: 00:02:35.120") where the race was decisively won.
2. **Justification (1-3 sentences):** Briefly explain *why* you selected this specific frame, referencing key visual and/or commentary cues that led to your decision.
3. **Short description of the winner** Briefly explain how the winner looks like, what is he dressed like, what is he riding if it's the case, what makes him stand-out in that frame and other things that are useful in identifying the winner in that frame
**Example of how you might reason (for internal thought process, not necessarily full output):**
* "At timestamp X, the commentator shouts 'And she wins it.' And the racer clearly passes the finish line with a commanding lead ahead of the other racers." Description: The winner is dressed in white and is riding a horse. In this frame he is present to the right of the image.*
**Important Considerations:**
* If the commentary is unclear or the visual evidence is ambiguous, state this and make your best judgment based on the available information.
* Prioritize the earliest definitive moment of the win.
'''
# Use the uploaded video in a prompt
prompt_parts = [
prompt,
upload_response # This is a File object
]
response = model.generate_content(prompt_parts, generation_config={
"temperature": 0
})
return response.text , upload_response