# `InferenceHTTPClient` `InferenceHTTPClient` was created to make it easy for users to consume HTTP API exposed by `inference` server. You can think of it, as a friendly wrapper over `requests` that you can use, instead of creating calling logic on your own. ## 🔥 quickstart ```python from inference_sdk import InferenceHTTPClient image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg" # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) predictions = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1") print(predictions) ``` ## What are the client capabilities? * Executing inference for models hosted at Roboflow platform (use client version `v0`) * Executing inference for models hosted in local (or on-prem) docker images with `inference` HTTP API * Works against single image (given as a local path, URL, `np.ndarray` or `PIL.Image`) * Minimalistic batch inference implemented (you can pass multiple images) * Implemented inference from video file and directory with images ## Why client has two modes - `v0` and `v1`? We are constantly improving our `infrence` package - initial version (`v0`) is compatible with models deployed at Roboflow platform (task types: `classification`, `object-detection`, `instance-segmentation` and `keypoints-detection`) are supported. Version `v1` is available in locally hosted Docker images with HTTP API. Locally hosted `inference` server exposes endpoints for model manipulations, but those endpoints are not available at the moment for models deployed at Roboflow platform. `api_url` parameter passed to `InferenceHTTPClient` will decide on default client mode - URLs with `*.roboflow.com` will be defaulted to version `v0`. Usage of model registry control methods with `v0` clients will raise `WrongClientModeError`. ## How I can adjust `InferenceHTTPClient` to work in my use-case? There are few ways on how configuration can be altered: ### configuring with context managers Methods `use_configuration(...)`, `use_api_v0(...)`, `use_api_v1(...)`, `use_model(...)` are designed to work in context managers. **Once context manager is left - old config values are restored.** ```python from inference_sdk import InferenceHTTPClient, InferenceConfiguration image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg" custom_configuration = InferenceConfiguration(confidence_threshold=0.8) # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) with CLIENT.use_api_v0(): _ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1") with CLIENT.use_configuration(custom_configuration): _ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1") with CLIENT.use_model("soccer-players-5fuqs/1"): _ = CLIENT.infer(image_url) # after leaving context manager - changes are reverted and `model_id` is still required _ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1") ``` As you can see - `model_id` is required to be given for prediction method only when default model is not configured. ### Setting the configuration once and using till next change Methods `configure(...)`, `select_api_v0(...)`, `select_api_v1(...)`, `select_model(...)` are designed alter the client state and will be preserved until next change. ```python from inference_sdk import InferenceHTTPClient, InferenceConfiguration image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg" custom_configuration = InferenceConfiguration(confidence_threshold=0.8) # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) CLIENT.select_api_v0() _ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1") # API v0 still holds CLIENT.configure(custom_configuration) CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1") # API v0 and custom configuration still holds CLIENT.select_model(model_id="soccer-players-5fuqs/1") _ = CLIENT.infer(image_url) # API v0, custom configuration and selected model - still holds _ = CLIENT.infer(image_url) ``` One may also initialise in `chain` mode: ```python from inference_sdk import InferenceHTTPClient, InferenceConfiguration # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \ .select_api_v0() \ .select_model("soccer-players-5fuqs/1") ``` ### Overriding `model_id` for specific call `model_id` can be overriden for specific call ```python from inference_sdk import InferenceHTTPClient image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg" # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \ .select_model("soccer-players-5fuqs/1") _ = CLIENT.infer(image_url, model_id="another-model/1") ``` ## Batch inference You may want to predict against multiple images at single call. It is possible, but so far - client-side batching is implemented in naive way (sequential requests to API) - stay tuned for future improvements. ```python from inference_sdk import InferenceHTTPClient image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg" # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) predictions = CLIENT.infer([image_url] * 5, model_id="soccer-players-5fuqs/1") print(predictions) ``` ## Inference against stream One may want to infer against video or directory of images - and that modes are supported in `inference-client` ```python from inference_sdk import InferenceHTTPClient # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) for frame_id, frame, prediction in CLIENT.infer_on_stream("video.mp4", model_id="soccer-players-5fuqs/1"): # frame_id is the number of frame # frame - np.ndarray with video frame # prediction - prediction from the model pass for file_path, image, prediction in CLIENT.infer_on_stream("local/dir/", model_id="soccer-players-5fuqs/1"): # file_path - path to the image # frame - np.ndarray with video frame # prediction - prediction from the model pass ``` ## What is actually returned as prediction? `inference_client` returns plain Python dictionaries that are responses from model serving API. Modification is done only in context of `visualization` key that keep server-generated prediction visualisation (it can be transcoded to the format of choice) and in terms of client-side re-scaling. ## Methods to control `inference` server (in `v1` mode only) ### Getting server info ```python from inference_sdk import InferenceHTTPClient # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) CLIENT.get_server_info() ``` ### Listing loaded models ```python from inference_sdk import InferenceHTTPClient # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) CLIENT.list_loaded_models() ``` ### Getting specific model description ```python from inference_sdk import InferenceHTTPClient # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) CLIENT.get_model_description(model_id="some/1", allow_loading=True) ``` If `allow_loading` is set to `True` - model will be loaded as side-effect if it is not already loaded. Default: `True`. ### Loading model ```python from inference_sdk import InferenceHTTPClient # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) CLIENT.load_model(model_id="some/1", set_as_default=True) ``` The pointed model will be loaded. If `set_as_default` is set to `True` - after successful load, model will be used as default model for the client. Default value: `False`. ### Unloading model ```python from inference_sdk import InferenceHTTPClient # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) CLIENT.unload_model(model_id="some/1") ``` Sometimes (to avoid OOM at server side) - unloading model will be required. [test_postprocessing.py](..%2F..%2Ftests%2Finference_client%2Funit_tests%2Fhttp%2Futils%2Ftest_postprocessing.py) ### Unloading all models ```python from inference_sdk import InferenceHTTPClient # Replace ROBOFLOW_API_KEY with your Roboflow API Key CLIENT = InferenceHTTPClient( api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY" ) CLIENT.unload_all_models() ``` ## Details about client configuration `inference-client` provides `InferenceConfiguration` dataclass to hold whole configuration. ```python from inference_sdk import InferenceConfiguration ``` Overriding fields in this config changes the behaviour of client (and API serving model). Specific fields are used in specific contexts. In particular: ### Inference in `v0` mode The following fields are passed to API * `confidence_threshold` (as `confidence`) - to alter model thresholding * `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints based on model confidence * `format` - to visualise on server side - use `image` (but then you loose prediction details from response) * `visualize_labels` (as `labels`) - used in visualisation to show / hide labels for classes * `mask_decode_mode` * `tradeoff_factor` * `max_detections` - max detections to return from model * `iou_threshold` (as `overlap`) - to dictate NMS IoU threshold * `stroke_width` - width of stroke in visualisation * `count_inference` as `countinference` * `service_secret` * `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, `disable_preproc_static_crop` to alter server-side pre-processing ### Classification model in `v1` mode: * `visualize_predictions` - flag to enable / disable visualisation * `confidence_threshold` as `confidence` * `stroke_width` - width of stroke in visualisation * `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, `disable_preproc_static_crop` to alter server-side pre-processing ### Object detection model in `v1` mode: * `visualize_predictions` - flag to enable / disable visualisation * `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled * `confidence_threshold` as `confidence` * `class_filter` to filter out list of classes * `class_agnostic_nms` - flag to control whether NMS is class-agnostic * `fix_batch_size` * `iou_threshold` - to dictate NMS IoU threshold * `stroke_width` - width of stroke in visualisation * `max_detections` - max detections to return from model * `max_candidates` - max candidates to post-processing from model * `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, `disable_preproc_static_crop` to alter server-side pre-processing ### Keypoints detection model in `v1` mode: * `visualize_predictions` - flag to enable / disable visualisation * `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled * `confidence_threshold` as `confidence` * `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints based on model confidence * `class_filter` to filter out list of object classes * `class_agnostic_nms` - flag to control whether NMS is class-agnostic * `fix_batch_size` * `iou_threshold` - to dictate NMS IoU threshold * `stroke_width` - width of stroke in visualisation * `max_detections` - max detections to return from model * `max_candidates` - max candidates to post-processing from model * `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, `disable_preproc_static_crop` to alter server-side pre-processing ### Instance segmentation model in `v1` mode: * `visualize_predictions` - flag to enable / disable visualisation * `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled * `confidence_threshold` as `confidence` * `class_filter` to filter out list of classes * `class_agnostic_nms` - flag to control whether NMS is class-agnostic * `fix_batch_size` * `iou_threshold` - to dictate NMS IoU threshold * `stroke_width` - width of stroke in visualisation * `max_detections` - max detections to return from model * `max_candidates` - max candidates to post-processing from model * `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`, `disable_preproc_static_crop` to alter server-side pre-processing * `mask_decode_mode` * `tradeoff_factor` ### Configuration of client * `output_visualisation_format` - one of (`VisualisationResponseFormat.BASE64`, `VisualisationResponseFormat.NUMPY`, `VisualisationResponseFormat.PILLOW`) - given that server-side visualisation is enabled - one may choose what format should be used in output * `image_extensions_for_directory_scan` - while using `CLIENT.infer_on_stream(...)` with local directory this parameter controls type of files (extensions) allowed to be processed - default: `["jpg", "jpeg", "JPG", "JPEG", "png", "PNG"]` * `client_downsizing_disabled` - set to `True` if you want to avoid client-side downsizing - default `False`. Client-side scaling is only supposed to down-scale (keeping aspect-ratio) the input for inference - to utilise internet connection more efficiently (but for the price of images manipulation / transcoding). If model registry endpoint is available (mode `v1`) - model input size information will be used, if not: `default_max_input_size` will be in use.