File size: 2,590 Bytes
7a3ba2e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
library_name: transformers
tags:
- object-detection
- owlv2
- zero-shot
- visual-prompting
license: apache-2.0
---

# OWLv2 Inference Endpoint

Custom handler for OWLv2 (Open-World Localization v2) supporting both **image-conditioned** and **text-conditioned** object detection.

## Features

- **Image-conditioned detection**: Find objects similar to a reference image
- **Text-conditioned detection**: Find objects matching text descriptions
- **Multiple query images**: Search for several different objects at once

## Usage

### Image-Conditioned Detection

Find all instances of an icon/object in a target image:

```python
import requests
import base64

API_URL = "https://your-endpoint.endpoints.huggingface.cloud"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

# Load images as base64
with open("screenshot.png", "rb") as f:
    target_b64 = base64.b64encode(f.read()).decode()
with open("icon.png", "rb") as f:
    query_b64 = base64.b64encode(f.read()).decode()

response = requests.post(API_URL, headers=headers, json={
    "inputs": {
        "target_image": target_b64,
        "query_image": query_b64,
        "threshold": 0.5
    }
})

print(response.json())
# {"detections": [{"box": [100, 200, 150, 250], "confidence": 0.92}]}
```

### Text-Conditioned Detection

Find objects by description:

```python
response = requests.post(API_URL, headers=headers, json={
    "inputs": {
        "target_image": target_b64,
        "queries": ["a play button", "a settings icon"],
        "threshold": 0.1
    }
})
```

### Multiple Query Images

Find several different objects:

```python
response = requests.post(API_URL, headers=headers, json={
    "inputs": {
        "target_image": target_b64,
        "query_images": [icon1_b64, icon2_b64, icon3_b64],
        "threshold": 0.5
    }
})
# Results include "label": "query_0", "query_1", etc.
```

## Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `target_image` | string | required | Base64-encoded target image |
| `query_image` | string | - | Base64-encoded reference image |
| `query_images` | array | - | Multiple base64-encoded reference images |
| `queries` | array | - | Text descriptions to search for |
| `threshold` | float | 0.5 | Confidence threshold (0-1) |
| `nms_threshold` | float | 0.3 | Non-max suppression threshold |

## Response Format

```json
{
  "detections": [
    {
      "box": [x1, y1, x2, y2],
      "confidence": 0.95,
      "label": "query_0"
    }
  ]
}
```

## Model

Uses `google/owlv2-large-patch14-ensemble` for best accuracy.