Spaces:
Sleeping
Sleeping
| import os | |
| import google.generativeai as genai | |
| import time | |
| def extract_bounding_box(image_path,description): | |
| # Set your API key | |
| genai.configure(api_key=os.getenv("GOOGLE_API_KEY")) | |
| # Load the model (Gemini 1.5 Flash is currently accessible via the multimodal endpoint) | |
| model = genai.GenerativeModel(model_name="models/gemini-2.5-pro") | |
| # Upload video as a part of a multi-turn prompt | |
| upload_response = genai.upload_file(path=image_path, mime_type="image/png") | |
| while True: | |
| status = genai.get_file(upload_response.name) | |
| if status.state == 2: | |
| break | |
| time.sleep(1) | |
| prompt = f''' | |
| **Role:** You are an expert in precise object detection within static images. | |
| **Task:** | |
| Your goal is to identify the **winning contestant** in the provided image and output the precise coordinates of its bounding box in the precised format which are normalized between 1-1000. | |
| **Input:** | |
| 1. **Image:** The image is provided in the prompt | |
| 2. **Identification of Winner:** {description} | |
| **Instructions for Identifying the Winner (if not explicitly provided by user):** | |
| * Assume the "winner" is the horse that is: | |
| * Visibly in the most advanced position relative to other competitors (if any are visible). | |
| * Closest to or clearly crossing a discernible finish line (if one is present in the image). | |
| * Appearing most dominant or distinctly ahead if other cues are absent. | |
| * If the image is a close-up of a single horse, that horse is the winner by default. | |
| * Ensure you select the bounding box og the winner that matches bet the descripttion provided. | |
| **Bounding Box Requirements:** | |
| * The bounding box must encompass the **entire visible portion** of the identified winning horse, including its head, body, all visible legs, and tail. | |
| * The bounding box should be the **tightest possible rectangle** around the horse. | |
| * Avoid including significant background elements or other distinct entities (like other horses or jockeys) unless they are directly occluding a part of the winning horse. | |
| * The coordinates should be normalized to 0-1000. | |
| **Output Format:** | |
| Provide the bounding box coordinates in the following format: | |
| `(y_min, x_min, y_max, x_max)` | |
| Where: | |
| * `(x_min, y_min)` are the pixel coordinates of the top-left corner of the bounding box. | |
| * `(x_max, y_max)` are the pixel coordinates of the bottom-right corner of the bounding box. | |
| * Assume the coordinate system origin (0,0) is at the top-left corner of the provided image. | |
| **Example Output:** | |
| (y_min, x_min, y_max , x_max) | |
| e.g., (480, 598, 608, 720) | |
| **Important Considerations for the AI:** | |
| * **Occlusion:** If the winning horse is partially occluded by other objects or racers, provide the bounding box for the visible parts of the *winning horse only*. Briefly note if significant occlusion might affect the bounding box's accuracy. | |
| * **Ambiguity:** If, even with the provided image, identifying the *single* clear winner is highly ambiguous (e.g., a very tight photo finish with multiple horses equally positioned), state this ambiguity. If possible, provide bounding boxes for all equally plausible winners, labeling them distinctly if you can. | |
| ''' | |
| # Use the uploaded video in a prompt | |
| prompt_parts = [ | |
| upload_response, | |
| prompt | |
| ] | |
| response = model.generate_content(prompt_parts, generation_config={ | |
| "temperature": 0.5 | |
| }) | |
| return response.text | |