peterproofpath
/

owlv2-detector

Object Detection

visual-prompting

Model card Files Files and versions

owlv2-detector / README.md

peterproofpath's picture

Upload 3 files

7a3ba2e verified about 1 month ago

|

history blame contribute delete

2.59 kB

	---
	library_name: transformers
	tags:
	- object-detection
	- owlv2
	- zero-shot
	- visual-prompting
	license: apache-2.0
	---

	# OWLv2 Inference Endpoint

	Custom handler for OWLv2 (Open-World Localization v2) supporting both image-conditioned and text-conditioned object detection.

	## Features

	- Image-conditioned detection: Find objects similar to a reference image
	- Text-conditioned detection: Find objects matching text descriptions
	- Multiple query images: Search for several different objects at once

	## Usage

	### Image-Conditioned Detection

	Find all instances of an icon/object in a target image:

	```python
	import requests
	import base64

	API_URL = "https://your-endpoint.endpoints.huggingface.cloud"
	headers = {"Authorization": "Bearer YOUR_TOKEN"}

	# Load images as base64
	with open("screenshot.png", "rb") as f:
	target_b64 = base64.b64encode(f.read()).decode()
	with open("icon.png", "rb") as f:
	query_b64 = base64.b64encode(f.read()).decode()

	response = requests.post(API_URL, headers=headers, json={
	"inputs": {
	"target_image": target_b64,
	"query_image": query_b64,
	"threshold": 0.5
	}
	})

	print(response.json())
	# {"detections": [{"box": [100, 200, 150, 250], "confidence": 0.92}]}
	```

	### Text-Conditioned Detection

	Find objects by description:

	```python
	response = requests.post(API_URL, headers=headers, json={
	"inputs": {
	"target_image": target_b64,
	"queries": ["a play button", "a settings icon"],
	"threshold": 0.1
	}
	})
	```

	### Multiple Query Images

	Find several different objects:

	```python
	response = requests.post(API_URL, headers=headers, json={
	"inputs": {
	"target_image": target_b64,
	"query_images": [icon1_b64, icon2_b64, icon3_b64],
	"threshold": 0.5
	}
	})
	# Results include "label": "query_0", "query_1", etc.
	```

	## Parameters

	\| Parameter \| Type \| Default \| Description \|
	\|-----------\|------\|---------\|-------------\|
	\| `target_image` \| string \| required \| Base64-encoded target image \|
	\| `query_image` \| string \| - \| Base64-encoded reference image \|
	\| `query_images` \| array \| - \| Multiple base64-encoded reference images \|
	\| `queries` \| array \| - \| Text descriptions to search for \|
	\| `threshold` \| float \| 0.5 \| Confidence threshold (0-1) \|
	\| `nms_threshold` \| float \| 0.3 \| Non-max suppression threshold \|

	## Response Format

	```json
	{
	"detections": [
	{
	"box": [x1, y1, x2, y2],
	"confidence": 0.95,
	"label": "query_0"
	}
	]
	}
	```

	## Model

	Uses `google/owlv2-large-patch14-ensemble` for best accuracy.