Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / inference-endpoints /pr_113 /en /guides /advanced.html

rtrm

about 1 month ago

download

raw

7.55 kB

	<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Advanced Setup (Instance Types, Auto Scaling, Versioning)","local":"advanced-setup-instance-types-auto-scaling-versioning","sections":[],"depth":1}">
	<link href="/docs/inference-endpoints/pr_113/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
	<link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/entry/start.d1c14968.js">
	<link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/scheduler.389d799c.js">
	<link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/singletons.16c9b508.js">
	<link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/paths.58d119e0.js">
	<link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/entry/app.18050d92.js">
	<link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/index.8f81d18f.js">
	<link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/nodes/0.ce016c16.js">
	<link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/nodes/6.4cb4ba25.js">
	<link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/getInferenceSnippets.8efa8e08.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Advanced Setup (Instance Types, Auto Scaling, Versioning)","local":"advanced-setup-instance-types-auto-scaling-versioning","sections":[],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="advanced-setup-instance-types-auto-scaling-versioning" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#advanced-setup-instance-types-auto-scaling-versioning"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Advanced Setup (Instance Types, Auto Scaling, Versioning)</span></h1> <p data-svelte-h="svelte-p2szpz">We have seen how fast and easy it is to deploy an Endpoint in <a href="/docs/inference-endpoints/guides/create_endpoint">Create your first Endpoint</a>, but that’s not all you can manage. During the creation process and after selecting your Cloud Provider and Region, click on the [Advanced configuration] button to reveal further configuration options for your Endpoint.</p> <p data-svelte-h="svelte-a9jto8"><strong>Instance type</strong></p> <p data-svelte-h="svelte-17hkkpg">🤗 Inference Endpoints offers a selection of curated CPU and GPU instances.</p> <p data-svelte-h="svelte-99pd09"><em>Note: Your Hugging Face account comes with a capacity quota for CPU and GPU instances. To increase your quota or request new instance types, please check with us.</em></p> <p data-svelte-h="svelte-2lxh28"><em>Default: CPU-medium</em></p> <img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/instance_types.png" alt="copy curl"> <p data-svelte-h="svelte-16c4xnh"><strong>Replica autoscaling</strong></p> <p data-svelte-h="svelte-1aqi2y9">Set the range (minimum (>=1) and maximum ) of replicas you want your Endpoint to automatically scale within based on utilization.</p> <p data-svelte-h="svelte-1a02b6e"><em>Default: min 1; max 2</em></p> <p data-svelte-h="svelte-1fw5nk2"><strong>Task</strong></p> <p data-svelte-h="svelte-l57l0h">Select a <a href="/docs/inference-endpoints/supported_tasks">supported Machine Learning Task</a>, or set to <a href="/docs/inference-endpoints/guides/custom_handler">Custom</a>. <a href="/docs/inference-endpoints/guides/custom_handler">Custom</a> can/should be used when you are not using a Transformers-based model or when you want to customize the inference pipeline, see <a href="/docs/inference-endpoints/guides/custom_handler">Create your own Inference handler</a>.</p> <p data-svelte-h="svelte-1s9e1ul"><em>Default: derived from the model repository.</em></p> <p data-svelte-h="svelte-1dgalg9"><strong>Framework</strong></p> <p data-svelte-h="svelte-1wbcidm">For Transformers models, if both PyTorch and TensorFlow weights are available, you can select which model weights to use. This will help reduce the image artifact size and accelerate startups/scaling of your endpoints.</p> <p data-svelte-h="svelte-1gscxsr"><em>Default: PyTorch if available.</em></p> <p data-svelte-h="svelte-8f8b5m"><strong>Revision</strong></p> <p data-svelte-h="svelte-1tnz7cr">Create your Endpoint targeting a specific revision commit for its source Hugging Face Model Repository. This allows you to version your endpoint and make sure you are always using the same weights even if you are updating the Model Repository.</p> <p data-svelte-h="svelte-j7ufko"><em>Default: The most recent commit.</em></p> <p data-svelte-h="svelte-1ismnd4"><strong>Image</strong></p> <p data-svelte-h="svelte-j6w94w">Allows you to provide a custom container image you want to deploy into an Endpoint. Those can be public images, e.g <em>tensorflow/serving:2.7.3,</em> or private Images hosted on <a href="https://hub.docker.com/" rel="nofollow">Docker hub</a>, <a href="https://aws.amazon.com/ecr/?nc1=h_ls" rel="nofollow">AWS ECR</a>, <a href="https://azure.microsoft.com/de-de/services/container-registry/" rel="nofollow">Azure ACR</a>, or <a href="https://cloud.google.com/container-registry?hl=de" rel="nofollow">Google GCR</a>.</p> <p data-svelte-h="svelte-1biq916">More on how to <a href="/docs/inference-endpoints/guides/custom_handler">“Use your own custom container”</a> below.</p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/hf-endpoints-documentation/blob/main/docs/source/guides/advanced.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>

	<script>
	{
	__sveltekit_87vzq7 = {
	assets: "/docs/inference-endpoints/pr_113/en",
	base: "/docs/inference-endpoints/pr_113/en",
	env: {}
	};

	const element = document.currentScript.parentElement;

	const data = [null,null];

	Promise.all([
	import("/docs/inference-endpoints/pr_113/en/_app/immutable/entry/start.d1c14968.js"),
	import("/docs/inference-endpoints/pr_113/en/_app/immutable/entry/app.18050d92.js")
	]).then(([kit, app]) => {
	kit.start(app, element, {
	node_ids: [0, 6],
	data,
	form: null,
	error: null
	});
	});
	}
	</script>

Xet Storage Details

Size:: 7.55 kB
Xet hash:: adf7fa2fa2d1cbae44400181e8eff47c528344ec7597136b99a45ec04ae15cc1

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.