File size: 14,335 Bytes
c446951 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 |
# `InferenceHTTPClient`
`InferenceHTTPClient` was created to make it easy for users to consume HTTP API exposed by `inference` server. You
can think of it, as a friendly wrapper over `requests` that you can use, instead of creating calling logic on
your own.
## 🔥 quickstart
```python
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
print(predictions)
```
## What are the client capabilities?
* Executing inference for models hosted at Roboflow platform (use client version `v0`)
* Executing inference for models hosted in local (or on-prem) docker images with `inference` HTTP API
* Works against single image (given as a local path, URL, `np.ndarray` or `PIL.Image`)
* Minimalistic batch inference implemented (you can pass multiple images)
* Implemented inference from video file and directory with images
## Why client has two modes - `v0` and `v1`?
We are constantly improving our `infrence` package - initial version (`v0`) is compatible with
models deployed at Roboflow platform (task types: `classification`, `object-detection`, `instance-segmentation` and
`keypoints-detection`)
are supported. Version `v1` is available in locally hosted Docker images with HTTP API.
Locally hosted `inference` server exposes endpoints for model manipulations, but those endpoints are not available
at the moment for models deployed at Roboflow platform.
`api_url` parameter passed to `InferenceHTTPClient` will decide on default client mode - URLs with `*.roboflow.com`
will be defaulted to version `v0`.
Usage of model registry control methods with `v0` clients will raise `WrongClientModeError`.
## How I can adjust `InferenceHTTPClient` to work in my use-case?
There are few ways on how configuration can be altered:
### configuring with context managers
Methods `use_configuration(...)`, `use_api_v0(...)`, `use_api_v1(...)`, `use_model(...)` are designed to
work in context managers. **Once context manager is left - old config values are restored.**
```python
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
with CLIENT.use_api_v0():
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
with CLIENT.use_configuration(custom_configuration):
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
with CLIENT.use_model("soccer-players-5fuqs/1"):
_ = CLIENT.infer(image_url)
# after leaving context manager - changes are reverted and `model_id` is still required
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
```
As you can see - `model_id` is required to be given for prediction method only when default model is not configured.
### Setting the configuration once and using till next change
Methods `configure(...)`, `select_api_v0(...)`, `select_api_v1(...)`, `select_model(...)` are designed alter the client
state and will be preserved until next change.
```python
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.select_api_v0()
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
# API v0 still holds
CLIENT.configure(custom_configuration)
CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
# API v0 and custom configuration still holds
CLIENT.select_model(model_id="soccer-players-5fuqs/1")
_ = CLIENT.infer(image_url)
# API v0, custom configuration and selected model - still holds
_ = CLIENT.infer(image_url)
```
One may also initialise in `chain` mode:
```python
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
.select_api_v0() \
.select_model("soccer-players-5fuqs/1")
```
### Overriding `model_id` for specific call
`model_id` can be overriden for specific call
```python
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
.select_model("soccer-players-5fuqs/1")
_ = CLIENT.infer(image_url, model_id="another-model/1")
```
## Batch inference
You may want to predict against multiple images at single call. It is possible, but so far - client-side
batching is implemented in naive way (sequential requests to API) - stay tuned for future improvements.
```python
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer([image_url] * 5, model_id="soccer-players-5fuqs/1")
print(predictions)
```
## Inference against stream
One may want to infer against video or directory of images - and that modes are supported in `inference-client`
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
for frame_id, frame, prediction in CLIENT.infer_on_stream("video.mp4", model_id="soccer-players-5fuqs/1"):
# frame_id is the number of frame
# frame - np.ndarray with video frame
# prediction - prediction from the model
pass
for file_path, image, prediction in CLIENT.infer_on_stream("local/dir/", model_id="soccer-players-5fuqs/1"):
# file_path - path to the image
# frame - np.ndarray with video frame
# prediction - prediction from the model
pass
```
## What is actually returned as prediction?
`inference_client` returns plain Python dictionaries that are responses from model serving API. Modification
is done only in context of `visualization` key that keep server-generated prediction visualisation (it
can be transcoded to the format of choice) and in terms of client-side re-scaling.
## Methods to control `inference` server (in `v1` mode only)
### Getting server info
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_server_info()
```
### Listing loaded models
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.list_loaded_models()
```
### Getting specific model description
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_model_description(model_id="some/1", allow_loading=True)
```
If `allow_loading` is set to `True` - model will be loaded as side-effect if it is not already loaded.
Default: `True`.
### Loading model
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.load_model(model_id="some/1", set_as_default=True)
```
The pointed model will be loaded. If `set_as_default` is set to `True` - after successful load, model
will be used as default model for the client. Default value: `False`.
### Unloading model
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_model(model_id="some/1")
```
Sometimes (to avoid OOM at server side) - unloading model will be required.
[test_postprocessing.py](..%2F..%2Ftests%2Finference_client%2Funit_tests%2Fhttp%2Futils%2Ftest_postprocessing.py)
### Unloading all models
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_all_models()
```
## Details about client configuration
`inference-client` provides `InferenceConfiguration` dataclass to hold whole configuration.
```python
from inference_sdk import InferenceConfiguration
```
Overriding fields in this config changes the behaviour of client (and API serving model). Specific fields are
used in specific contexts. In particular:
### Inference in `v0` mode
The following fields are passed to API
* `confidence_threshold` (as `confidence`) - to alter model thresholding
* `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints
based on model confidence
* `format` - to visualise on server side - use `image` (but then you loose prediction details from response)
* `visualize_labels` (as `labels`) - used in visualisation to show / hide labels for classes
* `mask_decode_mode`
* `tradeoff_factor`
* `max_detections` - max detections to return from model
* `iou_threshold` (as `overlap`) - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `count_inference` as `countinference`
* `service_secret`
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
### Classification model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `confidence_threshold` as `confidence`
* `stroke_width` - width of stroke in visualisation
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
### Object detection model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
* `confidence_threshold` as `confidence`
* `class_filter` to filter out list of classes
* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
* `fix_batch_size`
* `iou_threshold` - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `max_detections` - max detections to return from model
* `max_candidates` - max candidates to post-processing from model
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
### Keypoints detection model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
* `confidence_threshold` as `confidence`
* `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints
based on model confidence
* `class_filter` to filter out list of object classes
* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
* `fix_batch_size`
* `iou_threshold` - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `max_detections` - max detections to return from model
* `max_candidates` - max candidates to post-processing from model
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
### Instance segmentation model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
* `confidence_threshold` as `confidence`
* `class_filter` to filter out list of classes
* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
* `fix_batch_size`
* `iou_threshold` - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `max_detections` - max detections to return from model
* `max_candidates` - max candidates to post-processing from model
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
* `mask_decode_mode`
* `tradeoff_factor`
### Configuration of client
* `output_visualisation_format` - one of (`VisualisationResponseFormat.BASE64`, `VisualisationResponseFormat.NUMPY`,
`VisualisationResponseFormat.PILLOW`) - given that server-side visualisation is enabled - one may choose what
format should be used in output
* `image_extensions_for_directory_scan` - while using `CLIENT.infer_on_stream(...)` with local directory
this parameter controls type of files (extensions) allowed to be processed -
default: `["jpg", "jpeg", "JPG", "JPEG", "png", "PNG"]`
* `client_downsizing_disabled` - set to `True` if you want to avoid client-side downsizing - default `False`.
Client-side scaling is only supposed to down-scale (keeping aspect-ratio) the input for inference -
to utilise internet connection more efficiently (but for the price of images manipulation / transcoding).
If model registry endpoint is available (mode `v1`) - model input size information will be used, if not:
`default_max_input_size` will be in use.
|