Spaces:

Sankie005
/

Docker_ml

No application file

App Files Files Community

Docker_ml / docs /inference_sdk /http_client.md

Sankie005

Upload 434 files

c446951 about 2 years ago

preview code

raw

history blame contribute delete

14.3 kB

`InferenceHTTPClient`

InferenceHTTPClient was created to make it easy for users to consume HTTP API exposed by inference server. You can think of it, as a friendly wrapper over requests that you can use, instead of creating calling logic on your own.

🔥 quickstart

from inference_sdk import InferenceHTTPClient

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

print(predictions)

What are the client capabilities?

Executing inference for models hosted at Roboflow platform (use client version v0)
Executing inference for models hosted in local (or on-prem) docker images with inference HTTP API
Works against single image (given as a local path, URL, np.ndarray or PIL.Image)
Minimalistic batch inference implemented (you can pass multiple images)
Implemented inference from video file and directory with images

Why client has two modes - `v0` and `v1`?

We are constantly improving our infrence package - initial version (v0) is compatible with models deployed at Roboflow platform (task types: classification, object-detection, instance-segmentation and keypoints-detection) are supported. Version v1 is available in locally hosted Docker images with HTTP API.

Locally hosted inference server exposes endpoints for model manipulations, but those endpoints are not available at the moment for models deployed at Roboflow platform.

api_url parameter passed to InferenceHTTPClient will decide on default client mode - URLs with *.roboflow.com will be defaulted to version v0.

Usage of model registry control methods with v0 clients will raise WrongClientModeError.

How I can adjust `InferenceHTTPClient` to work in my use-case?

There are few ways on how configuration can be altered:

configuring with context managers

Methods use_configuration(...), use_api_v0(...), use_api_v1(...), use_model(...) are designed to work in context managers. Once context manager is left - old config values are restored.

from inference_sdk import InferenceHTTPClient, InferenceConfiguration

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
with CLIENT.use_api_v0():
    _ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

with CLIENT.use_configuration(custom_configuration):
    _ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

with CLIENT.use_model("soccer-players-5fuqs/1"):
    _ = CLIENT.infer(image_url)

# after leaving context manager - changes are reverted and `model_id` is still required
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

As you can see - model_id is required to be given for prediction method only when default model is not configured.

Setting the configuration once and using till next change

Methods configure(...), select_api_v0(...), select_api_v1(...), select_model(...) are designed alter the client state and will be preserved until next change.

from inference_sdk import InferenceHTTPClient, InferenceConfiguration

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.select_api_v0()
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

# API v0 still holds
CLIENT.configure(custom_configuration)
CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

# API v0 and custom configuration still holds
CLIENT.select_model(model_id="soccer-players-5fuqs/1")
_ = CLIENT.infer(image_url)

# API v0, custom configuration and selected model - still holds
_ = CLIENT.infer(image_url)

One may also initialise in chain mode:

from inference_sdk import InferenceHTTPClient, InferenceConfiguration

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
    .select_api_v0() \
    .select_model("soccer-players-5fuqs/1")

Overriding `model_id` for specific call

model_id can be overriden for specific call

from inference_sdk import InferenceHTTPClient

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
    .select_model("soccer-players-5fuqs/1")

_ = CLIENT.infer(image_url, model_id="another-model/1")

Batch inference

You may want to predict against multiple images at single call. It is possible, but so far - client-side batching is implemented in naive way (sequential requests to API) - stay tuned for future improvements.

from inference_sdk import InferenceHTTPClient

image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer([image_url] * 5, model_id="soccer-players-5fuqs/1")

print(predictions)

Inference against stream

One may want to infer against video or directory of images - and that modes are supported in inference-client

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
for frame_id, frame, prediction in CLIENT.infer_on_stream("video.mp4", model_id="soccer-players-5fuqs/1"):
    # frame_id is the number of frame
    # frame - np.ndarray with video frame
    # prediction - prediction from the model
    pass

for file_path, image, prediction in CLIENT.infer_on_stream("local/dir/", model_id="soccer-players-5fuqs/1"):
    # file_path - path to the image
    # frame - np.ndarray with video frame
    # prediction - prediction from the model
    pass

What is actually returned as prediction?

inference_client returns plain Python dictionaries that are responses from model serving API. Modification is done only in context of visualization key that keep server-generated prediction visualisation (it can be transcoded to the format of choice) and in terms of client-side re-scaling.

Methods to control `inference` server (in `v1` mode only)

Getting server info

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_server_info()

Listing loaded models

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.list_loaded_models()

Getting specific model description

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_model_description(model_id="some/1", allow_loading=True)

If allow_loading is set to True - model will be loaded as side-effect if it is not already loaded. Default: True.

Loading model

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.load_model(model_id="some/1", set_as_default=True)

The pointed model will be loaded. If set_as_default is set to True - after successful load, model will be used as default model for the client. Default value: False.

Unloading model

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_model(model_id="some/1")

Sometimes (to avoid OOM at server side) - unloading model will be required. test_postprocessing.py

Unloading all models

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_all_models()

Details about client configuration

inference-client provides InferenceConfiguration dataclass to hold whole configuration.

from inference_sdk import InferenceConfiguration

Overriding fields in this config changes the behaviour of client (and API serving model). Specific fields are used in specific contexts. In particular:

Inference in `v0` mode

The following fields are passed to API

confidence_threshold (as confidence) - to alter model thresholding
keypoint_confidence_threshold as (keypoint_confidence) - to filter out detected keypoints based on model confidence
format - to visualise on server side - use image (but then you loose prediction details from response)
visualize_labels (as labels) - used in visualisation to show / hide labels for classes
mask_decode_mode
tradeoff_factor
max_detections - max detections to return from model
iou_threshold (as overlap) - to dictate NMS IoU threshold
stroke_width - width of stroke in visualisation
count_inference as countinference
service_secret
disable_preproc_auto_orientation, disable_preproc_contrast, disable_preproc_grayscale, disable_preproc_static_crop to alter server-side pre-processing

Classification model in `v1` mode:

visualize_predictions - flag to enable / disable visualisation
confidence_threshold as confidence
stroke_width - width of stroke in visualisation
disable_preproc_auto_orientation, disable_preproc_contrast, disable_preproc_grayscale, disable_preproc_static_crop to alter server-side pre-processing

Object detection model in `v1` mode:

visualize_predictions - flag to enable / disable visualisation
visualize_labels - flag to enable / disable labels visualisation if visualisation is enabled
confidence_threshold as confidence
class_filter to filter out list of classes
class_agnostic_nms - flag to control whether NMS is class-agnostic
fix_batch_size
iou_threshold - to dictate NMS IoU threshold
stroke_width - width of stroke in visualisation
max_detections - max detections to return from model
max_candidates - max candidates to post-processing from model
disable_preproc_auto_orientation, disable_preproc_contrast, disable_preproc_grayscale, disable_preproc_static_crop to alter server-side pre-processing

Keypoints detection model in `v1` mode:

visualize_predictions - flag to enable / disable visualisation
visualize_labels - flag to enable / disable labels visualisation if visualisation is enabled
confidence_threshold as confidence
keypoint_confidence_threshold as (keypoint_confidence) - to filter out detected keypoints based on model confidence
class_filter to filter out list of object classes
class_agnostic_nms - flag to control whether NMS is class-agnostic
fix_batch_size
iou_threshold - to dictate NMS IoU threshold
stroke_width - width of stroke in visualisation
max_detections - max detections to return from model
max_candidates - max candidates to post-processing from model
disable_preproc_auto_orientation, disable_preproc_contrast, disable_preproc_grayscale, disable_preproc_static_crop to alter server-side pre-processing

Instance segmentation model in `v1` mode:

visualize_predictions - flag to enable / disable visualisation
visualize_labels - flag to enable / disable labels visualisation if visualisation is enabled
confidence_threshold as confidence
class_filter to filter out list of classes
class_agnostic_nms - flag to control whether NMS is class-agnostic
fix_batch_size
iou_threshold - to dictate NMS IoU threshold
stroke_width - width of stroke in visualisation
max_detections - max detections to return from model
max_candidates - max candidates to post-processing from model
disable_preproc_auto_orientation, disable_preproc_contrast, disable_preproc_grayscale, disable_preproc_static_crop to alter server-side pre-processing
mask_decode_mode
tradeoff_factor

Configuration of client

output_visualisation_format - one of (VisualisationResponseFormat.BASE64, VisualisationResponseFormat.NUMPY, VisualisationResponseFormat.PILLOW) - given that server-side visualisation is enabled - one may choose what format should be used in output
image_extensions_for_directory_scan - while using CLIENT.infer_on_stream(...) with local directory this parameter controls type of files (extensions) allowed to be processed - default: ["jpg", "jpeg", "JPG", "JPEG", "png", "PNG"]
client_downsizing_disabled - set to True if you want to avoid client-side downsizing - default False. Client-side scaling is only supposed to down-scale (keeping aspect-ratio) the input for inference - to utilise internet connection more efficiently (but for the price of images manipulation / transcoding). If model registry endpoint is available (mode v1) - model input size information will be used, if not: default_max_input_size will be in use.

InferenceHTTPClient