--- library_name: transformers tags: - object-detection - owlv2 - zero-shot - visual-prompting license: apache-2.0 --- # OWLv2 Inference Endpoint Custom handler for OWLv2 (Open-World Localization v2) supporting both **image-conditioned** and **text-conditioned** object detection. ## Features - **Image-conditioned detection**: Find objects similar to a reference image - **Text-conditioned detection**: Find objects matching text descriptions - **Multiple query images**: Search for several different objects at once ## Usage ### Image-Conditioned Detection Find all instances of an icon/object in a target image: ```python import requests import base64 API_URL = "https://your-endpoint.endpoints.huggingface.cloud" headers = {"Authorization": "Bearer YOUR_TOKEN"} # Load images as base64 with open("screenshot.png", "rb") as f: target_b64 = base64.b64encode(f.read()).decode() with open("icon.png", "rb") as f: query_b64 = base64.b64encode(f.read()).decode() response = requests.post(API_URL, headers=headers, json={ "inputs": { "target_image": target_b64, "query_image": query_b64, "threshold": 0.5 } }) print(response.json()) # {"detections": [{"box": [100, 200, 150, 250], "confidence": 0.92}]} ``` ### Text-Conditioned Detection Find objects by description: ```python response = requests.post(API_URL, headers=headers, json={ "inputs": { "target_image": target_b64, "queries": ["a play button", "a settings icon"], "threshold": 0.1 } }) ``` ### Multiple Query Images Find several different objects: ```python response = requests.post(API_URL, headers=headers, json={ "inputs": { "target_image": target_b64, "query_images": [icon1_b64, icon2_b64, icon3_b64], "threshold": 0.5 } }) # Results include "label": "query_0", "query_1", etc. ``` ## Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `target_image` | string | required | Base64-encoded target image | | `query_image` | string | - | Base64-encoded reference image | | `query_images` | array | - | Multiple base64-encoded reference images | | `queries` | array | - | Text descriptions to search for | | `threshold` | float | 0.5 | Confidence threshold (0-1) | | `nms_threshold` | float | 0.3 | Non-max suppression threshold | ## Response Format ```json { "detections": [ { "box": [x1, y1, x2, y2], "confidence": 0.95, "label": "query_0" } ] } ``` ## Model Uses `google/owlv2-large-patch14-ensemble` for best accuracy.