Spaces:

VladB46
/

RacingDemo

Sleeping

RacingDemo / src /models /bounding_box_extractor.py

Vlad Bastina

merge

10877f8 8 months ago

3.47 kB

	import os
	import google.generativeai as genai
	import time


	def extract_bounding_box(image_path,description):
	# Set your API key
	genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

	# Load the model (Gemini 1.5 Flash is currently accessible via the multimodal endpoint)
	model = genai.GenerativeModel(model_name="models/gemini-2.5-pro")

	# Upload video as a part of a multi-turn prompt
	upload_response = genai.upload_file(path=image_path, mime_type="image/png")

	while True:
	status = genai.get_file(upload_response.name)
	if status.state == 2:
	break
	time.sleep(1)

	prompt = f'''
	Role: You are an expert in precise object detection within static images.

	Task:
	Your goal is to identify the winning contestant in the provided image and output the precise coordinates of its bounding box in the precised format which are normalized between 1-1000.

	Input:
	1. Image: The image is provided in the prompt
	2. Identification of Winner: {description}

	Instructions for Identifying the Winner (if not explicitly provided by user):
	* Assume the "winner" is the horse that is:
	* Visibly in the most advanced position relative to other competitors (if any are visible).
	* Closest to or clearly crossing a discernible finish line (if one is present in the image).
	* Appearing most dominant or distinctly ahead if other cues are absent.
	* If the image is a close-up of a single horse, that horse is the winner by default.
	* Ensure you select the bounding box og the winner that matches bet the descripttion provided.

	Bounding Box Requirements:
	* The bounding box must encompass the entire visible portion of the identified winning horse, including its head, body, all visible legs, and tail.
	* The bounding box should be the tightest possible rectangle around the horse.
	* Avoid including significant background elements or other distinct entities (like other horses or jockeys) unless they are directly occluding a part of the winning horse.
	* The coordinates should be normalized to 0-1000.

	Output Format:
	Provide the bounding box coordinates in the following format:
	`(y_min, x_min, y_max, x_max)`
	Where:
	* `(x_min, y_min)` are the pixel coordinates of the top-left corner of the bounding box.
	* `(x_max, y_max)` are the pixel coordinates of the bottom-right corner of the bounding box.
	* Assume the coordinate system origin (0,0) is at the top-left corner of the provided image.

	Example Output:
	(y_min, x_min, y_max , x_max)
	e.g., (480, 598, 608, 720)

	Important Considerations for the AI:
	* Occlusion: If the winning horse is partially occluded by other objects or racers, provide the bounding box for the visible parts of the winning horse only. Briefly note if significant occlusion might affect the bounding box's accuracy.
	* Ambiguity: If, even with the provided image, identifying the single clear winner is highly ambiguous (e.g., a very tight photo finish with multiple horses equally positioned), state this ambiguity. If possible, provide bounding boxes for all equally plausible winners, labeling them distinctly if you can.
	'''

	# Use the uploaded video in a prompt
	prompt_parts = [
	upload_response,
	prompt
	]

	response = model.generate_content(prompt_parts, generation_config={
	"temperature": 0.5
	})

	return response.text