Spaces:
Runtime error
Runtime error
| from smolagents import CodeAgent, LogLevel | |
| from tools.bbox_annotator import BBoxAnnotatorTool | |
| from tools.cropping_tool import CroppingTool | |
| from tools.image_classification_tool import ImageClassificationTool | |
| from tools.image_resize_tool import ImageResizeTool | |
| from tools.image_segmentation_tool import ImageSegmentationTool | |
| from tools.inference_converter import TaskInferenceOutputConverterTool | |
| from tools.label_annotator import LabelAnnotatorTool | |
| from tools.mask_annotator import MaskAnnotatorTool | |
| from tools.object_detection_tool import ObjectDetectionTool | |
| from tools.segment_anything_tool import SegmentAnythingTool | |
| from tools.task_model_retriever import TaskModelRetrieverTool | |
| def get_master_agent(llm): | |
| description = """ | |
| You are an agent that can perform tasks on an image. | |
| For whatever action you will perform on an image, make sure to resize it to a smaller size before using the tools. | |
| You can always crop the image and keep the part of the image as is, if the crop is not too large. | |
| You can use the following tools to perform tasks on an image: | |
| - task_model_retriever: to retrieve models that can perform a task, you must provide the type of task that a model can perform, the supported tasks are: | |
| - object-detection | |
| - image-segmentation | |
| - image-classification | |
| - object_detection: tool to perform the object-detection task, provide an image and a class of objects to get it detected. | |
| - image_segmentation: tool to perform the image-segmentation task, provide an image and a class of objects to get it segmented. | |
| - image_classification: tool to perform the image-classification task, provide an image and a class of objects to get it classified. | |
| - segment_anything: tool to perform the segment-anything task, provide an image and a list of bounding boxes to get it segmented. | |
| Once you have the detections, you will most likely need to use the task_inference_output_converter tool to convert the detections to the proper format. | |
| Then you can use the following tools to annotate the image: | |
| - bbox_annotator: tool to annotate the image with bounding boxes, provide an image and a list of detections to get it annotated. | |
| - mask_annotator: tool to annotate the image with masks, provide an image and a list of detections to get it annotated. | |
| - label_annotator: tool to annotate the image with labels, provide an image and a list of detections to get it annotated. | |
| - cropping: tool to crop the image, provide an image and a bounding box to get it cropped. | |
| - image_resize: tool to resize the image, provide an image and a width and a height to get it resized. | |
| # - upscaler: tool to upscale the image, provide an image to get it upscaled. | |
| Make sure to differentiate between the images you annotate, which will be images you will show the user and the images you use to perform | |
| tasks on, which will be images you will not show the user. For example, don't annotate on the image then use this image to perform | |
| image classification, as this will alterate the inference. Also, don't do cropping on an annotated image. | |
| The user may ask questions about concepts present in the image, some may be general concepts that an object detection model would be sufficient | |
| to detect and determine, some may be specific concepts that require you to first detect, then use a specific model (image classification e.g.) | |
| to add more precision. | |
| Example : the user provides an image with multiple dogs, the user may ask you not only to detect the dogs, but also to classify them, depending | |
| on the type of dog, the breed, the color, the age, etc. In this case, you would first use the object_detection tool to detect the bounding boxes aroung the dogs, | |
| then crop the bounding boxes and use the image_classification tool to classify the cropped images of the dogs. | |
| If you don't know what model to use, you can use the task_model_retriever tool to retrieve the model. | |
| Never assume an invented model name, always use the model name provided by the task_model_retriever tool. | |
| Use batching to perform tasks on multiple images at once when a tool supports it. | |
| You have access to the variable "image" which is the image to perform tasks on, no need to load it, it is already loaded. | |
| Always use the variable "image" to draw the bounding boxes on the image. | |
| Whenever you need to use a tool, first write the tool call in the form of a code block. | |
| Then, wait for the tool to return the result. | |
| Then, use the result to perform the task. Step by step. | |
| Make sure to be as exhaustive as possible in your reasoning and in the way of completing the task. | |
| If, for example, you need to detect a specific type of object, make sure to detect all the objects of that type, | |
| use multiple tools to do so, even use multiple models to detect such objects. | |
| """ | |
| master_agent = CodeAgent( | |
| name="master_agent", | |
| description=description, | |
| model=llm, | |
| tools=[ | |
| TaskModelRetrieverTool(), | |
| ObjectDetectionTool(), | |
| ImageClassificationTool(), | |
| ImageSegmentationTool(), | |
| SegmentAnythingTool(), | |
| TaskInferenceOutputConverterTool(), | |
| BBoxAnnotatorTool(), | |
| MaskAnnotatorTool(), | |
| LabelAnnotatorTool(), | |
| CroppingTool(), | |
| ImageResizeTool(), | |
| # UpscalerTool(), | |
| ], | |
| verbosity_level=LogLevel.DEBUG, | |
| ) | |
| print("Loaded master agent") | |
| return master_agent | |