Docker_ml / docs /inference_sdk /http_client.md
Sankie005's picture
Upload 434 files
c446951
# `InferenceHTTPClient`
`InferenceHTTPClient` was created to make it easy for users to consume HTTP API exposed by `inference` server. You
can think of it, as a friendly wrapper over `requests` that you can use, instead of creating calling logic on
your own.
## 🔥 quickstart
```python
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
print(predictions)
```
## What are the client capabilities?
* Executing inference for models hosted at Roboflow platform (use client version `v0`)
* Executing inference for models hosted in local (or on-prem) docker images with `inference` HTTP API
* Works against single image (given as a local path, URL, `np.ndarray` or `PIL.Image`)
* Minimalistic batch inference implemented (you can pass multiple images)
* Implemented inference from video file and directory with images
## Why client has two modes - `v0` and `v1`?
We are constantly improving our `infrence` package - initial version (`v0`) is compatible with
models deployed at Roboflow platform (task types: `classification`, `object-detection`, `instance-segmentation` and
`keypoints-detection`)
are supported. Version `v1` is available in locally hosted Docker images with HTTP API.
Locally hosted `inference` server exposes endpoints for model manipulations, but those endpoints are not available
at the moment for models deployed at Roboflow platform.
`api_url` parameter passed to `InferenceHTTPClient` will decide on default client mode - URLs with `*.roboflow.com`
will be defaulted to version `v0`.
Usage of model registry control methods with `v0` clients will raise `WrongClientModeError`.
## How I can adjust `InferenceHTTPClient` to work in my use-case?
There are few ways on how configuration can be altered:
### configuring with context managers
Methods `use_configuration(...)`, `use_api_v0(...)`, `use_api_v1(...)`, `use_model(...)` are designed to
work in context managers. **Once context manager is left - old config values are restored.**
```python
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
with CLIENT.use_api_v0():
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
with CLIENT.use_configuration(custom_configuration):
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
with CLIENT.use_model("soccer-players-5fuqs/1"):
_ = CLIENT.infer(image_url)
# after leaving context manager - changes are reverted and `model_id` is still required
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
```
As you can see - `model_id` is required to be given for prediction method only when default model is not configured.
### Setting the configuration once and using till next change
Methods `configure(...)`, `select_api_v0(...)`, `select_api_v1(...)`, `select_model(...)` are designed alter the client
state and will be preserved until next change.
```python
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.select_api_v0()
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
# API v0 still holds
CLIENT.configure(custom_configuration)
CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
# API v0 and custom configuration still holds
CLIENT.select_model(model_id="soccer-players-5fuqs/1")
_ = CLIENT.infer(image_url)
# API v0, custom configuration and selected model - still holds
_ = CLIENT.infer(image_url)
```
One may also initialise in `chain` mode:
```python
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
.select_api_v0() \
.select_model("soccer-players-5fuqs/1")
```
### Overriding `model_id` for specific call
`model_id` can be overriden for specific call
```python
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
.select_model("soccer-players-5fuqs/1")
_ = CLIENT.infer(image_url, model_id="another-model/1")
```
## Batch inference
You may want to predict against multiple images at single call. It is possible, but so far - client-side
batching is implemented in naive way (sequential requests to API) - stay tuned for future improvements.
```python
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer([image_url] * 5, model_id="soccer-players-5fuqs/1")
print(predictions)
```
## Inference against stream
One may want to infer against video or directory of images - and that modes are supported in `inference-client`
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
for frame_id, frame, prediction in CLIENT.infer_on_stream("video.mp4", model_id="soccer-players-5fuqs/1"):
# frame_id is the number of frame
# frame - np.ndarray with video frame
# prediction - prediction from the model
pass
for file_path, image, prediction in CLIENT.infer_on_stream("local/dir/", model_id="soccer-players-5fuqs/1"):
# file_path - path to the image
# frame - np.ndarray with video frame
# prediction - prediction from the model
pass
```
## What is actually returned as prediction?
`inference_client` returns plain Python dictionaries that are responses from model serving API. Modification
is done only in context of `visualization` key that keep server-generated prediction visualisation (it
can be transcoded to the format of choice) and in terms of client-side re-scaling.
## Methods to control `inference` server (in `v1` mode only)
### Getting server info
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_server_info()
```
### Listing loaded models
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.list_loaded_models()
```
### Getting specific model description
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_model_description(model_id="some/1", allow_loading=True)
```
If `allow_loading` is set to `True` - model will be loaded as side-effect if it is not already loaded.
Default: `True`.
### Loading model
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.load_model(model_id="some/1", set_as_default=True)
```
The pointed model will be loaded. If `set_as_default` is set to `True` - after successful load, model
will be used as default model for the client. Default value: `False`.
### Unloading model
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_model(model_id="some/1")
```
Sometimes (to avoid OOM at server side) - unloading model will be required.
[test_postprocessing.py](..%2F..%2Ftests%2Finference_client%2Funit_tests%2Fhttp%2Futils%2Ftest_postprocessing.py)
### Unloading all models
```python
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_all_models()
```
## Details about client configuration
`inference-client` provides `InferenceConfiguration` dataclass to hold whole configuration.
```python
from inference_sdk import InferenceConfiguration
```
Overriding fields in this config changes the behaviour of client (and API serving model). Specific fields are
used in specific contexts. In particular:
### Inference in `v0` mode
The following fields are passed to API
* `confidence_threshold` (as `confidence`) - to alter model thresholding
* `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints
based on model confidence
* `format` - to visualise on server side - use `image` (but then you loose prediction details from response)
* `visualize_labels` (as `labels`) - used in visualisation to show / hide labels for classes
* `mask_decode_mode`
* `tradeoff_factor`
* `max_detections` - max detections to return from model
* `iou_threshold` (as `overlap`) - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `count_inference` as `countinference`
* `service_secret`
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
### Classification model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `confidence_threshold` as `confidence`
* `stroke_width` - width of stroke in visualisation
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
### Object detection model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
* `confidence_threshold` as `confidence`
* `class_filter` to filter out list of classes
* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
* `fix_batch_size`
* `iou_threshold` - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `max_detections` - max detections to return from model
* `max_candidates` - max candidates to post-processing from model
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
### Keypoints detection model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
* `confidence_threshold` as `confidence`
* `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints
based on model confidence
* `class_filter` to filter out list of object classes
* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
* `fix_batch_size`
* `iou_threshold` - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `max_detections` - max detections to return from model
* `max_candidates` - max candidates to post-processing from model
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
### Instance segmentation model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
* `confidence_threshold` as `confidence`
* `class_filter` to filter out list of classes
* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
* `fix_batch_size`
* `iou_threshold` - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `max_detections` - max detections to return from model
* `max_candidates` - max candidates to post-processing from model
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
`disable_preproc_static_crop` to alter server-side pre-processing
* `mask_decode_mode`
* `tradeoff_factor`
### Configuration of client
* `output_visualisation_format` - one of (`VisualisationResponseFormat.BASE64`, `VisualisationResponseFormat.NUMPY`,
`VisualisationResponseFormat.PILLOW`) - given that server-side visualisation is enabled - one may choose what
format should be used in output
* `image_extensions_for_directory_scan` - while using `CLIENT.infer_on_stream(...)` with local directory
this parameter controls type of files (extensions) allowed to be processed -
default: `["jpg", "jpeg", "JPG", "JPEG", "png", "PNG"]`
* `client_downsizing_disabled` - set to `True` if you want to avoid client-side downsizing - default `False`.
Client-side scaling is only supposed to down-scale (keeping aspect-ratio) the input for inference -
to utilise internet connection more efficiently (but for the price of images manipulation / transcoding).
If model registry endpoint is available (mode `v1`) - model input size information will be used, if not:
`default_max_input_size` will be in use.