Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Configuration","local":"configuration","sections":[{"title":"Endpoint name, model and organization","local":"endpoint-name-model-and-organization","sections":[],"depth":2},{"title":"Hardware Configuration","local":"hardware-configuration","sections":[],"depth":2},{"title":"Security Level","local":"security-level","sections":[],"depth":2},{"title":"Autoscaling","local":"autoscaling","sections":[],"depth":2},{"title":"Container Configuration","local":"container-configuration","sections":[],"depth":2},{"title":"Environment Variables","local":"environment-variables","sections":[],"depth":2},{"title":"Endpoint Tags","local":"endpoint-tags","sections":[],"depth":2},{"title":"Advanced Settings","local":"advanced-settings","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/inference-endpoints/pr_136/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/entry/start.fb9ab4d6.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/chunks/scheduler.f6b352c8.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/chunks/singletons.ceca4163.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/chunks/index.26cf6c5a.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/chunks/paths.142cd5df.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/entry/app.6247727a.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/chunks/index.b90df637.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/nodes/0.2fcde12d.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/nodes/13.a65733c0.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_136/en/_app/immutable/chunks/getInferenceSnippets.1e3ae0bf.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Configuration","local":"configuration","sections":[{"title":"Endpoint name, model and organization","local":"endpoint-name-model-and-organization","sections":[],"depth":2},{"title":"Hardware Configuration","local":"hardware-configuration","sections":[],"depth":2},{"title":"Security Level","local":"security-level","sections":[],"depth":2},{"title":"Autoscaling","local":"autoscaling","sections":[],"depth":2},{"title":"Container Configuration","local":"container-configuration","sections":[],"depth":2},{"title":"Environment Variables","local":"environment-variables","sections":[],"depth":2},{"title":"Endpoint Tags","local":"endpoint-tags","sections":[],"depth":2},{"title":"Advanced Settings","local":"advanced-settings","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="configuration" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#configuration"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Configuration</span></h1> <p data-svelte-h="svelte-1iggp98">This section describes the configuration options available when creating a new inference endpoint. Each section of | |
| the interface allows fine-grained control over how the model is deployed, accessed, and scaled.</p> <h2 class="relative group"><a id="endpoint-name-model-and-organization" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#endpoint-name-model-and-organization"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Endpoint name, model and organization</span></h2> <p data-svelte-h="svelte-1sd4j0z">In the top left you can:</p> <ul data-svelte-h="svelte-1qy89hk"><li>change the name of the inference endpoint</li> <li>verify to which organization you’re deploying this model</li> <li>verify which model you are deploying</li> <li>and which Hugging Face Hub repo you are deploying this model from</li></ul> <p data-svelte-h="svelte-8t663c"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/configuration/1-name-org-model.png" alt="name-org-model"></p> <h2 class="relative group"><a id="hardware-configuration" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#hardware-configuration"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Hardware Configuration</span></h2> <p data-svelte-h="svelte-1o9wtbi">The Hardware Configuration section allows you to choose the compute backend used to host the model. | |
| You can select from three major cloud providers:</p> <ul data-svelte-h="svelte-1nit8px"><li>Amazon Web Services (AWS)</li> <li>Microsoft Azure</li> <li>Google Cloud Platform</li></ul> <p data-svelte-h="svelte-171jvsn"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/configuration/2-hardware.png" alt="hardware"></p> <p data-svelte-h="svelte-a5xv7a">You must also choose an accelerator type:</p> <ul data-svelte-h="svelte-1c8pe9h"><li>CPU</li> <li>GPU</li> <li>INF2 (AWS Inferentia)</li></ul> <p data-svelte-h="svelte-14px1ty">Additionally, you can select the deployment region (e.g., East US) using the dropdown menu. Once the | |
| provider, accelerator, and region are chosen, a list of available instance types is displayed. Each instance tile includes:</p> <ul data-svelte-h="svelte-8x219j"><li>GPU Type and Count</li> <li>Memory (e.g., 48 GB)</li> <li>vCPUs and RAM</li> <li>Hourly Pricing (e.g., $1.80 / h)</li></ul> <p data-svelte-h="svelte-tqi8q5">You can select a tile to choose that instance type for your deployment. Instances that are incompatible or unavailable in the | |
| selected region are grayed out and unclickable.</p> <h2 class="relative group"><a id="security-level" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#security-level"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Security Level</span></h2> <p data-svelte-h="svelte-1j8fq6y">This section determines who can access your deployed endpoint. Available options are:</p> <ul data-svelte-h="svelte-1eziv6l"><li><strong>Protected (default)</strong>: Accessible only to members of your Hugging Face organization using personal access tokens. The endpoint is secured with TLS/SSL.</li> <li><strong>Public</strong>: Anyone on the internet can access the endpoint.</li> <li><strong>HF Restricted</strong>: Anyone with a Hugging Face account can access it, using a personal Hugging Face Token generated from their account.</li> <li><strong>AWS Private</strong>: The endpoint is only available through an intra-region secured AWS PrivateLink connection.</li></ul> <p data-svelte-h="svelte-1bsri9e"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/configuration/3-security.png" alt="security"></p> <h2 class="relative group"><a id="autoscaling" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#autoscaling"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Autoscaling</span></h2> <p data-svelte-h="svelte-t69vok">The Autoscaling section configures how many replicas of your model run and whether the system scales down to zero during periods of inactivity. For more | |
| information we recommend reading the <a href="./autoscaling">in-depth guide on autoscaling</a>.</p> <p data-svelte-h="svelte-11k9llv"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/configuration/4-autoscaling.png" alt="autoscaling"></p> <ul data-svelte-h="svelte-cis8v9"><li><strong>Automatic Scale-to-Zero</strong>: A dropdown lets you choose how long the system should wait after the last request before | |
| scaling down to zero. Default is after 1 hour with no activity.</li> <li><strong>Number of Replicas</strong>:<ul><li>Min: Minimum number of replicas to keep running. Note that enabling automatic scale-to-zero requires setting this to 0.</li> <li>Max: Maximum number of replicas allowed (e.g., 1)</li></ul></li> <li><strong>Autoscaling strategy</strong>:<ul><li>Based on hardware usage: For example, a scale up will be triggered if the average hardware utilisation (%) exceeds this threshold for more than 20 seconds.</li> <li>Pending requests: A scale up event will be triggered if the average number of pending requests exceeds this threshold for more than 20 seconds.</li></ul></li></ul> <h2 class="relative group"><a id="container-configuration" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#container-configuration"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Container Configuration</span></h2> <p data-svelte-h="svelte-mijh53">This section allows you to specify how the container hosting your model behaves. This setting depends on the selected inference engine. | |
| For configuration details, please read the <a href="https://not-here" rel="nofollow">Inference Engine</a> section.</p> <h2 class="relative group"><a id="environment-variables" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#environment-variables"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Environment Variables</span></h2> <p data-svelte-h="svelte-1pyc3um">Environment variables can be provided to customize container behavior or pass secrets.</p> <ul data-svelte-h="svelte-1352j3g"><li><strong>Default Env</strong>: Key-value pairs passed as plain environment variables.</li> <li><strong>Secret Env</strong>: Key-value pairs stored securely and injected at runtime.</li></ul> <p data-svelte-h="svelte-bqywcv">Each section allows you to add multiple entries using the Add button.</p> <p data-svelte-h="svelte-1ikgxv8"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/configuration/5-env-vars.png" alt="env-vars"></p> <h2 class="relative group"><a id="endpoint-tags" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#endpoint-tags"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Endpoint Tags</span></h2> <p data-svelte-h="svelte-x7p47v">You can label endpoints with tags (e.g., for-testing) to help organize and manage deployments across environments or teams. In the dashboard | |
| you will be able to filter and sort endpoints based on these tags. | |
| Tags are plain text labels added via the Add button.</p> <p data-svelte-h="svelte-1gtrisl"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/configuration/6-tags.png" alt="tags"></p> <h2 class="relative group"><a id="advanced-settings" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#advanced-settings"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Advanced Settings</span></h2> <p data-svelte-h="svelte-1u5ppiv">Advanced Settings offer more fine-grained control over deployment.</p> <p data-svelte-h="svelte-1khxjze"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/configuration/7-advanced.png" alt="advanced"></p> <ul data-svelte-h="svelte-1fwagwe"><li><strong>Commit Revision</strong>: Optionally specify a commit hash to which revision of the model repository on the Hugging Face Hub | |
| you want to download the model artifacts from</li> <li><strong>Task</strong>: Defines the type of model task. This is usually inferred from the model repository.</li> <li><strong>Container Arguments</strong>: Pass CLI-style arguments to the container entrypoint.</li> <li><strong>Container Command</strong>: Override the container entrypoint entirely.</li> <li><strong>Download Pattern</strong>: Defines which model files are downloaded.</li></ul> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/hf-endpoints-documentation/blob/main/docs/source/guides/configuration.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1q0n26o = { | |
| assets: "/docs/inference-endpoints/pr_136/en", | |
| base: "/docs/inference-endpoints/pr_136/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/inference-endpoints/pr_136/en/_app/immutable/entry/start.fb9ab4d6.js"), | |
| import("/docs/inference-endpoints/pr_136/en/_app/immutable/entry/app.6247727a.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 13], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 21.4 kB
- Xet hash:
- 8554d1ddd8c4b3027a3e4254230f7a2e85e11229f145252a0b7dec4391dddc40
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.