|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- object-detection |
|
|
- owlv2 |
|
|
- zero-shot |
|
|
- visual-prompting |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# OWLv2 Inference Endpoint |
|
|
|
|
|
Custom handler for OWLv2 (Open-World Localization v2) supporting both **image-conditioned** and **text-conditioned** object detection. |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Image-conditioned detection**: Find objects similar to a reference image |
|
|
- **Text-conditioned detection**: Find objects matching text descriptions |
|
|
- **Multiple query images**: Search for several different objects at once |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Image-Conditioned Detection |
|
|
|
|
|
Find all instances of an icon/object in a target image: |
|
|
|
|
|
```python |
|
|
import requests |
|
|
import base64 |
|
|
|
|
|
API_URL = "https://your-endpoint.endpoints.huggingface.cloud" |
|
|
headers = {"Authorization": "Bearer YOUR_TOKEN"} |
|
|
|
|
|
# Load images as base64 |
|
|
with open("screenshot.png", "rb") as f: |
|
|
target_b64 = base64.b64encode(f.read()).decode() |
|
|
with open("icon.png", "rb") as f: |
|
|
query_b64 = base64.b64encode(f.read()).decode() |
|
|
|
|
|
response = requests.post(API_URL, headers=headers, json={ |
|
|
"inputs": { |
|
|
"target_image": target_b64, |
|
|
"query_image": query_b64, |
|
|
"threshold": 0.5 |
|
|
} |
|
|
}) |
|
|
|
|
|
print(response.json()) |
|
|
# {"detections": [{"box": [100, 200, 150, 250], "confidence": 0.92}]} |
|
|
``` |
|
|
|
|
|
### Text-Conditioned Detection |
|
|
|
|
|
Find objects by description: |
|
|
|
|
|
```python |
|
|
response = requests.post(API_URL, headers=headers, json={ |
|
|
"inputs": { |
|
|
"target_image": target_b64, |
|
|
"queries": ["a play button", "a settings icon"], |
|
|
"threshold": 0.1 |
|
|
} |
|
|
}) |
|
|
``` |
|
|
|
|
|
### Multiple Query Images |
|
|
|
|
|
Find several different objects: |
|
|
|
|
|
```python |
|
|
response = requests.post(API_URL, headers=headers, json={ |
|
|
"inputs": { |
|
|
"target_image": target_b64, |
|
|
"query_images": [icon1_b64, icon2_b64, icon3_b64], |
|
|
"threshold": 0.5 |
|
|
} |
|
|
}) |
|
|
# Results include "label": "query_0", "query_1", etc. |
|
|
``` |
|
|
|
|
|
## Parameters |
|
|
|
|
|
| Parameter | Type | Default | Description | |
|
|
|-----------|------|---------|-------------| |
|
|
| `target_image` | string | required | Base64-encoded target image | |
|
|
| `query_image` | string | - | Base64-encoded reference image | |
|
|
| `query_images` | array | - | Multiple base64-encoded reference images | |
|
|
| `queries` | array | - | Text descriptions to search for | |
|
|
| `threshold` | float | 0.5 | Confidence threshold (0-1) | |
|
|
| `nms_threshold` | float | 0.3 | Non-max suppression threshold | |
|
|
|
|
|
## Response Format |
|
|
|
|
|
```json |
|
|
{ |
|
|
"detections": [ |
|
|
{ |
|
|
"box": [x1, y1, x2, y2], |
|
|
"confidence": 0.95, |
|
|
"label": "query_0" |
|
|
} |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model |
|
|
|
|
|
Uses `google/owlv2-large-patch14-ensemble` for best accuracy. |
|
|
|