Spaces:

Sankie005
/

Docker_ml

No application file

App Files Files Community

Docker_ml / docs /inference_sdk /http_client.md

Sankie005

Upload 434 files

c446951 about 2 years ago

preview code

raw

history blame contribute delete

14.3 kB

	# `InferenceHTTPClient`

	`InferenceHTTPClient` was created to make it easy for users to consume HTTP API exposed by `inference` server. You
	can think of it, as a friendly wrapper over `requests` that you can use, instead of creating calling logic on
	your own.

	## 🔥 quickstart

	```python
	from inference_sdk import InferenceHTTPClient

	image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	predictions = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

	print(predictions)
	```

	## What are the client capabilities?
	* Executing inference for models hosted at Roboflow platform (use client version `v0`)
	* Executing inference for models hosted in local (or on-prem) docker images with `inference` HTTP API
	* Works against single image (given as a local path, URL, `np.ndarray` or `PIL.Image`)
	* Minimalistic batch inference implemented (you can pass multiple images)
	* Implemented inference from video file and directory with images

	## Why client has two modes - `v0` and `v1`?
	We are constantly improving our `infrence` package - initial version (`v0`) is compatible with
	models deployed at Roboflow platform (task types: `classification`, `object-detection`, `instance-segmentation` and
	`keypoints-detection`)
	are supported. Version `v1` is available in locally hosted Docker images with HTTP API.

	Locally hosted `inference` server exposes endpoints for model manipulations, but those endpoints are not available
	at the moment for models deployed at Roboflow platform.

	`api_url` parameter passed to `InferenceHTTPClient` will decide on default client mode - URLs with `*.roboflow.com`
	will be defaulted to version `v0`.

	Usage of model registry control methods with `v0` clients will raise `WrongClientModeError`.

	## How I can adjust `InferenceHTTPClient` to work in my use-case?
	There are few ways on how configuration can be altered:

	### configuring with context managers
	Methods `use_configuration(...)`, `use_api_v0(...)`, `use_api_v1(...)`, `use_model(...)` are designed to
	work in context managers. Once context manager is left - old config values are restored.

	```python
	from inference_sdk import InferenceHTTPClient, InferenceConfiguration

	image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

	custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	with CLIENT.use_api_v0():
	_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

	with CLIENT.use_configuration(custom_configuration):
	_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

	with CLIENT.use_model("soccer-players-5fuqs/1"):
	_ = CLIENT.infer(image_url)

	# after leaving context manager - changes are reverted and `model_id` is still required
	_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
	```

	As you can see - `model_id` is required to be given for prediction method only when default model is not configured.

	### Setting the configuration once and using till next change
	Methods `configure(...)`, `select_api_v0(...)`, `select_api_v1(...)`, `select_model(...)` are designed alter the client
	state and will be preserved until next change.

	```python
	from inference_sdk import InferenceHTTPClient, InferenceConfiguration

	image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

	custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	CLIENT.select_api_v0()
	_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

	# API v0 still holds
	CLIENT.configure(custom_configuration)
	CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")

	# API v0 and custom configuration still holds
	CLIENT.select_model(model_id="soccer-players-5fuqs/1")
	_ = CLIENT.infer(image_url)

	# API v0, custom configuration and selected model - still holds
	_ = CLIENT.infer(image_url)
	```

	One may also initialise in `chain` mode:

	```python
	from inference_sdk import InferenceHTTPClient, InferenceConfiguration

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
	.select_api_v0() \
	.select_model("soccer-players-5fuqs/1")
	```

	### Overriding `model_id` for specific call
	`model_id` can be overriden for specific call

	```python
	from inference_sdk import InferenceHTTPClient

	image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
	.select_model("soccer-players-5fuqs/1")

	_ = CLIENT.infer(image_url, model_id="another-model/1")
	```

	## Batch inference
	You may want to predict against multiple images at single call. It is possible, but so far - client-side
	batching is implemented in naive way (sequential requests to API) - stay tuned for future improvements.

	```python
	from inference_sdk import InferenceHTTPClient

	image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	predictions = CLIENT.infer([image_url] * 5, model_id="soccer-players-5fuqs/1")

	print(predictions)
	```

	## Inference against stream
	One may want to infer against video or directory of images - and that modes are supported in `inference-client`

	```python
	from inference_sdk import InferenceHTTPClient

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	for frame_id, frame, prediction in CLIENT.infer_on_stream("video.mp4", model_id="soccer-players-5fuqs/1"):
	# frame_id is the number of frame
	# frame - np.ndarray with video frame
	# prediction - prediction from the model
	pass

	for file_path, image, prediction in CLIENT.infer_on_stream("local/dir/", model_id="soccer-players-5fuqs/1"):
	# file_path - path to the image
	# frame - np.ndarray with video frame
	# prediction - prediction from the model
	pass
	```

	## What is actually returned as prediction?
	`inference_client` returns plain Python dictionaries that are responses from model serving API. Modification
	is done only in context of `visualization` key that keep server-generated prediction visualisation (it
	can be transcoded to the format of choice) and in terms of client-side re-scaling.

	## Methods to control `inference` server (in `v1` mode only)

	### Getting server info

	```python
	from inference_sdk import InferenceHTTPClient

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	CLIENT.get_server_info()
	```

	### Listing loaded models

	```python
	from inference_sdk import InferenceHTTPClient

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	CLIENT.list_loaded_models()
	```

	### Getting specific model description

	```python
	from inference_sdk import InferenceHTTPClient

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	CLIENT.get_model_description(model_id="some/1", allow_loading=True)
	```

	If `allow_loading` is set to `True` - model will be loaded as side-effect if it is not already loaded.
	Default: `True`.


	### Loading model

	```python
	from inference_sdk import InferenceHTTPClient

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	CLIENT.load_model(model_id="some/1", set_as_default=True)
	```

	The pointed model will be loaded. If `set_as_default` is set to `True` - after successful load, model
	will be used as default model for the client. Default value: `False`.


	### Unloading model

	```python
	from inference_sdk import InferenceHTTPClient

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	CLIENT.unload_model(model_id="some/1")
	```

	Sometimes (to avoid OOM at server side) - unloading model will be required.
	[test_postprocessing.py](..%2F..%2Ftests%2Finference_client%2Funit_tests%2Fhttp%2Futils%2Ftest_postprocessing.py)

	### Unloading all models

	```python
	from inference_sdk import InferenceHTTPClient

	# Replace ROBOFLOW_API_KEY with your Roboflow API Key
	CLIENT = InferenceHTTPClient(
	api_url="http://localhost:9001",
	api_key="ROBOFLOW_API_KEY"
	)
	CLIENT.unload_all_models()
	```


	## Details about client configuration

	`inference-client` provides `InferenceConfiguration` dataclass to hold whole configuration.

	```python
	from inference_sdk import InferenceConfiguration
	```

	Overriding fields in this config changes the behaviour of client (and API serving model). Specific fields are
	used in specific contexts. In particular:

	### Inference in `v0` mode
	The following fields are passed to API
	* `confidence_threshold` (as `confidence`) - to alter model thresholding
	* `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints
	based on model confidence
	* `format` - to visualise on server side - use `image` (but then you loose prediction details from response)
	* `visualize_labels` (as `labels`) - used in visualisation to show / hide labels for classes
	* `mask_decode_mode`
	* `tradeoff_factor`
	* `max_detections` - max detections to return from model
	* `iou_threshold` (as `overlap`) - to dictate NMS IoU threshold
	* `stroke_width` - width of stroke in visualisation
	* `count_inference` as `countinference`
	* `service_secret`
	* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
	`disable_preproc_static_crop` to alter server-side pre-processing


	### Classification model in `v1` mode:
	* `visualize_predictions` - flag to enable / disable visualisation
	* `confidence_threshold` as `confidence`
	* `stroke_width` - width of stroke in visualisation
	* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
	`disable_preproc_static_crop` to alter server-side pre-processing


	### Object detection model in `v1` mode:
	* `visualize_predictions` - flag to enable / disable visualisation
	* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
	* `confidence_threshold` as `confidence`
	* `class_filter` to filter out list of classes
	* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
	* `fix_batch_size`
	* `iou_threshold` - to dictate NMS IoU threshold
	* `stroke_width` - width of stroke in visualisation
	* `max_detections` - max detections to return from model
	* `max_candidates` - max candidates to post-processing from model
	* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
	`disable_preproc_static_crop` to alter server-side pre-processing


	### Keypoints detection model in `v1` mode:
	* `visualize_predictions` - flag to enable / disable visualisation
	* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
	* `confidence_threshold` as `confidence`
	* `keypoint_confidence_threshold` as (`keypoint_confidence`) - to filter out detected keypoints
	based on model confidence
	* `class_filter` to filter out list of object classes
	* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
	* `fix_batch_size`
	* `iou_threshold` - to dictate NMS IoU threshold
	* `stroke_width` - width of stroke in visualisation
	* `max_detections` - max detections to return from model
	* `max_candidates` - max candidates to post-processing from model
	* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
	`disable_preproc_static_crop` to alter server-side pre-processing


	### Instance segmentation model in `v1` mode:
	* `visualize_predictions` - flag to enable / disable visualisation
	* `visualize_labels` - flag to enable / disable labels visualisation if visualisation is enabled
	* `confidence_threshold` as `confidence`
	* `class_filter` to filter out list of classes
	* `class_agnostic_nms` - flag to control whether NMS is class-agnostic
	* `fix_batch_size`
	* `iou_threshold` - to dictate NMS IoU threshold
	* `stroke_width` - width of stroke in visualisation
	* `max_detections` - max detections to return from model
	* `max_candidates` - max candidates to post-processing from model
	* `disable_preproc_auto_orientation`, `disable_preproc_contrast`, `disable_preproc_grayscale`,
	`disable_preproc_static_crop` to alter server-side pre-processing
	* `mask_decode_mode`
	* `tradeoff_factor`

	### Configuration of client
	* `output_visualisation_format` - one of (`VisualisationResponseFormat.BASE64`, `VisualisationResponseFormat.NUMPY`,
	`VisualisationResponseFormat.PILLOW`) - given that server-side visualisation is enabled - one may choose what
	format should be used in output
	* `image_extensions_for_directory_scan` - while using `CLIENT.infer_on_stream(...)` with local directory
	this parameter controls type of files (extensions) allowed to be processed -
	default: `["jpg", "jpeg", "JPG", "JPEG", "png", "PNG"]`
	* `client_downsizing_disabled` - set to `True` if you want to avoid client-side downsizing - default `False`.
	Client-side scaling is only supposed to down-scale (keeping aspect-ratio) the input for inference -
	to utilise internet connection more efficiently (but for the price of images manipulation / transcoding).
	If model registry endpoint is available (mode `v1`) - model input size information will be used, if not:
	`default_max_input_size` will be in use.