Spaces:

Sankie005
/

Docker_ml

No application file

File size: 14,335 Bytes

c446951

# `InferenceHTTPClient`

`InferenceHTTPClient` was created to make it easy for users to consume HTTP API exposed by `inference` server. You
can think of it, as a friendly wrapper over `requests` that you can use, instead of creating calling logic on
your own.

## 🔥 quickstart

```python
from inference_sdk import InferenceHTTPClient

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

print(predictions)
```

## What are the client capabilities?
* Executing inference for models hosted at Roboflow platform (use client version `v0`)
* Executing inference for models hosted in local (or on-prem) docker images with `inference` HTTP API
* Works against single image (given as a local path, URL, `np.ndarray` or `PIL.Image`)
* Minimalistic batch inference implemented (you can pass multiple images)
* Implemented inference from video file and directory with images

## Why client has two modes - `v0` and `v1`?
We are constantly improving our `infrence` package - initial version (`v0`) is compatible with
models deployed at Roboflow platform (task types: `classification`, `object-detection`, `instance-segmentation` and
`keypoints-detection`)
are supported. Version `v1` is available in locally hosted Docker images with HTTP API. 

Locally hosted `inference` server exposes endpoints for model manipulations, but those endpoints are not available
at the moment for models deployed at Roboflow platform.

`api_url` parameter passed to `InferenceHTTPClient` will decide on default client mode - URLs with `*.roboflow.com`
will be defaulted to version `v0`.

Usage of model registry control methods with `v0` clients will raise `WrongClientModeError`.

## How I can adjust `InferenceHTTPClient` to work in my use-case?
There are few ways on how configuration can be altered:

### configuring with context managers
Methods `use_configuration(...)`, `use_api_v0(...)`, `use_api_v1(...)`, `use_model(...)` are designed to
work in context managers. **Once context manager is left - old config values are restored.**

```python
from inference_sdk import InferenceHTTPClient, InferenceConfiguration

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
with CLIENT.use_api_v0():
    _ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

with CLIENT.use_configuration(custom_configuration):
    _ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

with CLIENT.use_model("soccer-players-5fuqs/1"):
    _ = CLIENT.infer(image_url)

# after leaving context manager - changes are reverted and `model_id` is still required
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
```

As you can see - `model_id` is required to be given for prediction method only when default model is not configured.

### Setting the configuration once and using till next change
Methods `configure(...)`, `select_api_v0(...)`, `select_api_v1(...)`, `select_model(...)` are designed alter the client
state and will be preserved until next change.

```python
from inference_sdk import InferenceHTTPClient, InferenceConfiguration

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.select_api_v0()
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

# API v0 still holds
CLIENT.configure(custom_configuration)
CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

# API v0 and custom configuration still holds
CLIENT.select_model(model_id="soccer-players-5fuqs/1")
_ = CLIENT.infer(image_url)

# API v0, custom configuration and selected model - still holds
_ = CLIENT.infer(image_url)
```

One may also initialise in `chain` mode:

```python
from inference_sdk import InferenceHTTPClient, InferenceConfiguration

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
    .select_api_v0() \
    .select_model("soccer-players-5fuqs/1")
```

### Overriding `model_id` for specific call
`model_id` can be overriden for specific call

```python
from inference_sdk import InferenceHTTPClient

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
    .select_model("soccer-players-5fuqs/1")

_ = CLIENT.infer(image_url, model_id="another-model/1")
```

## Batch inference
You may want to predict against multiple images at single call. It is possible, but so far - client-side
batching is implemented in naive way (sequential requests to API) - stay tuned for future improvements.

```python
from inference_sdk import InferenceHTTPClient

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer([image_url] * 5, model_id="soccer-players-5fuqs/1")

print(predictions)
```

## Inference against stream
One may want to infer against video or directory of images - and that modes are supported in `inference-client`

```python
from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
for frame_id, frame, prediction in CLIENT.infer_on_stream("video.mp4", model_id="soccer-players-5fuqs/1"):
    # frame_id is the number of frame
    # frame - np.ndarray with video frame
    # prediction - prediction from the model
    pass

for file_path, image, prediction in CLIENT.infer_on_stream("local/dir/", model_id="soccer-players-5fuqs/1"):
    # file_path - path to the image
    # frame - np.ndarray with video frame
    # prediction - prediction from the model
    pass
```

## What is actually returned as prediction?
`inference_client` returns plain Python dictionaries that are responses from model serving API. Modification
is done only in context of `visualization` key that keep server-generated prediction visualisation (it
can be transcoded to the format of choice) and in terms of client-side re-scaling.

## Methods to control `inference` server (in `v1` mode only)

### Getting server info

```python
from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_server_info()
```

### Listing loaded models

```python
from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.list_loaded_models()
```

### Getting specific model description

```python
from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_model_description(model_id="some/1", allow_loading=True)
```

If `allow_loading` is set to `True` - model will be loaded as side-effect if it is not already loaded.
Default: `True`.


### Loading model

```python
from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.load_model(model_id="some/1", set_as_default=True)
```

The pointed model will be loaded. If `set_as_default` is set to `True` - after successful load, model
will be used as default model for the client. Default value: `False`.


### Unloading model

```python
from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_model(model_id="some/1")
```

Sometimes (to avoid OOM at server side) - unloading model will be required.
[test_postprocessing.py](..%2F..%2Ftests%2Finference_client%2Funit_tests%2Fhttp%2Futils%2Ftest_postprocessing.py)

### Unloading all models

```python
from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_all_models()
```


## Details about client configuration

`inference-client` provides `InferenceConfiguration` dataclass to hold whole configuration.

```python
from inference_sdk import InferenceConfiguration
```

Overriding fields in this config changes the behaviour of client (and API serving model). Specific fields are
used in specific contexts. In particular:

### Inference in `v0` mode
The following fields are passed to API
* `confidence_threshold` (as `confidence`) - to alter model thresholding
* `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints
based on model confidence
* `format` - to visualise on server side - use `image` (but then you loose prediction details from response)
* `visualize_labels` (as `labels`) - used in visualisation to show / hide labels for classes
* `mask_decode_mode`
* `tradeoff_factor`
* `max_detections` - max detections to return from model
* `iou_threshold` (as `overlap`) - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `count_inference` as `countinference`
* `service_secret`
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, 
`disable_preproc_static_crop` to alter server-side pre-processing


### Classification model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `confidence_threshold` as `confidence`
* `stroke_width` - width of stroke in visualisation
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, 
`disable_preproc_static_crop` to alter server-side pre-processing


### Object detection model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
* `confidence_threshold` as `confidence`
* `class_filter` to filter out list of classes
* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
* `fix_batch_size`
* `iou_threshold` - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `max_detections` - max detections to return from model
* `max_candidates` - max candidates to post-processing from model
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, 
`disable_preproc_static_crop` to alter server-side pre-processing


### Keypoints detection model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
* `confidence_threshold` as `confidence`
* `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints
based on model confidence
* `class_filter` to filter out list of object classes
* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
* `fix_batch_size`
* `iou_threshold` - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `max_detections` - max detections to return from model
* `max_candidates` - max candidates to post-processing from model
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, 
`disable_preproc_static_crop` to alter server-side pre-processing


### Instance segmentation model in `v1` mode:
* `visualize_predictions` - flag to enable / disable visualisation
* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
* `confidence_threshold` as `confidence`
* `class_filter` to filter out list of classes
* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
* `fix_batch_size`
* `iou_threshold` - to dictate NMS IoU threshold
* `stroke_width` - width of stroke in visualisation
* `max_detections` - max detections to return from model
* `max_candidates` - max candidates to post-processing from model
* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, 
`disable_preproc_static_crop` to alter server-side pre-processing
* `mask_decode_mode`
* `tradeoff_factor`

### Configuration of client
* `output_visualisation_format` - one of (`VisualisationResponseFormat.BASE64`, `VisualisationResponseFormat.NUMPY`, 
`VisualisationResponseFormat.PILLOW`) - given that server-side visualisation is enabled - one may choose what
format should be used in output
* `image_extensions_for_directory_scan` - while using `CLIENT.infer_on_stream(...)` with local directory
this parameter controls type of files (extensions) allowed to be processed - 
default: `["jpg", "jpeg", "JPG", "JPEG", "png", "PNG"]`
* `client_downsizing_disabled` - set to `True` if you want to avoid client-side downsizing - default `False`.
Client-side scaling is only supposed to down-scale (keeping aspect-ratio) the input for inference -
to utilise internet connection more efficiently (but for the price of images manipulation / transcoding).
If model registry endpoint is available (mode `v1`) - model input size information will be used, if not:
`default_max_input_size` will be in use.