Buckets:

rtrm's picture
download
raw
19.4 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Inference Endpoints&quot;,&quot;local&quot;:&quot;inference-endpoints&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;What you get&quot;,&quot;local&quot;:&quot;what-you-get&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}">
<link href="/docs/101-course/pr_4/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/entry/start.b6742992.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/chunks/scheduler.1d51f4c0.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/chunks/singletons.023d1c68.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/chunks/index.fa8592cf.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/chunks/paths.daa2f795.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/entry/app.8b986792.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/chunks/index.fda43871.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/nodes/0.b5fb3b56.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/nodes/13.e73f160e.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/chunks/Tip.e808fe4c.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/chunks/CodeBlock.16130beb.js">
<link rel="modulepreload" href="/docs/101-course/pr_4/en/_app/immutable/chunks/getInferenceSnippets.58a43ad0.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Inference Endpoints&quot;,&quot;local&quot;:&quot;inference-endpoints&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;What you get&quot;,&quot;local&quot;:&quot;what-you-get&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="inference-endpoints" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#inference-endpoints"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Inference Endpoints</span></h1> <p data-svelte-h="svelte-17yjqbd">Inference Endpoints let you deploy a specific model from the Hub as a dedicated, production-grade API with autoscaling, security, and configurability.</p> <p data-svelte-h="svelte-2lzrh2">In this unit, you’ll deploy an endpoint via the UI, learn where to find its URL and token requirements, and see minimal client code to interact with it.</p> <h2 class="relative group"><a id="what-you-get" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#what-you-get"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>What you get</span></h2> <ul data-svelte-h="svelte-1f946aw"><li>Dedicated, isolated serving for a single model</li> <li>Choice of cloud vendor, region, and accelerator</li> <li>Instance size/type selection and autoscaling</li> <li>Protected endpoints requiring authentication</li></ul> <h1 class="relative group"><a id="quick-start" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#quick-start"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Quick Start</span></h1> <p data-svelte-h="svelte-vku3xj">In this guide you’ll deploy a production ready AI model using Inference Endpoints in only a few minutes.
Make sure you’ve been able to log into the <a href="https://endpoints.huggingface.co" rel="nofollow">Inference Endpoints UI</a> with your Hugging Face account, and that you have a payment
method setup. If not, it’s a quick add of valid payment method in your <a href="https://huggingface.co/settings/billing" rel="nofollow">billing settings</a>.</p> <h2 class="relative group"><a id="create-your-endpoint" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#create-your-endpoint"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Create your endpoint</span></h2> <p data-svelte-h="svelte-1mtdv9n">Start by navigating to the Inference Endpoints UI, and once you’re logged in, you should see a button for creating a new Inference
Endpoint. Click the “New” button.</p> <p data-svelte-h="svelte-dnyg4"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/quick_start/1-new-button.png" alt="new-button"></p> <p data-svelte-h="svelte-ugol2d">From there you’ll be directed to the catalog. The Model Catalog consists of popular models which have tuned configurations to work in one-click
deploys. You can filter by name, task, hardware price, and much more.</p> <p data-svelte-h="svelte-fxecmn"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/quick_start/2-catalog.png" alt="catalog"></p> <p data-svelte-h="svelte-arciu6">In this example let’s deploy the <a href="https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct" rel="nofollow">meta-llama/Llama-3.2-3B-Instruct</a> model. You can find
it by searching for <code>llama-3.2-3b</code> in the search field and deploy it by clicking the card.</p> <p data-svelte-h="svelte-wb1fz6"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/quick_start/3-llama.png" alt="llama"></p> <p data-svelte-h="svelte-1ekjdqo">Next we’ll choose which hardware and deployment settings we’ll go for. Since this is a catalog model, all of the pre-selected options are very good
defaults. So in this case we don’t need to change anything. In case you want a deeper dive on what the different settings mean you can check out
the <a href="./guides/configuration">configuration guide</a>.</p> <p data-svelte-h="svelte-1baa6m9">For this model the Nvidia L4 is the recommended choice. It will be perfect for our testing. Performant but still reasonably priced. Also note that by
default the endpoint will scale down to zero, meaning it will become idle after 1h of inactivity.</p> <p data-svelte-h="svelte-uhpblt">Now all you need to do is click click “Create Endpoint” 🚀</p> <p data-svelte-h="svelte-14nerpn"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/quick_start/4-config.png" alt="config"></p> <p data-svelte-h="svelte-1ujocpz">Now our Inference Endpoint is initializing, which usually takes about 3-5 minutes. If you want to can allow browser notifications which will give you a
ping once the endpoint reaches a running state.</p> <p data-svelte-h="svelte-1fnidty"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/quick_start/5-init.png" alt="init"></p> <h2 class="relative group"><a id="test-your-inference-endpoint" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#test-your-inference-endpoint"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Test your Inference Endpoint</span></h2> <p data-svelte-h="svelte-1tao4jb">And then once everything is up and running you’ll be able to see the:</p> <ul data-svelte-h="svelte-1mzij4w"><li><strong>Endpoint URL</strong>: this is what you use to call your endpoint and send requests to it</li> <li><strong>Playground</strong>: a small visual way of quickly testing that the model works</li></ul> <p data-svelte-h="svelte-1qqv7r1"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/quick_start/6-done.png" alt="done"></p> <p data-svelte-h="svelte-cb7bxg">From the side of the playground you can also copy + paste a code snippet for calling the model. By clicking “App Tokens” you’ll be directed to Hugging Face
to configure an access token to be able to call the model. By default, all Inference Endpoints are created as private which require authentication and
all data is encryped in transit using TLS/SSL.</p> <p data-svelte-h="svelte-1e6r2vm">Congratulations, you just deployed a production ready AI model in Inference Endpoints 🔥</p> <p data-svelte-h="svelte-13b4e7a">Once you’re happy with the testing you can pause the Inference Endpoint, delete it. Or if you let it be, it will scale to zero after 1 hour.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> huggingface_hub <span class="hljs-keyword">import</span> list_inference_endpoints, get_inference_endpoint
<span class="hljs-keyword">for</span> ep <span class="hljs-keyword">in</span> list_inference_endpoints():
<span class="hljs-built_in">print</span>(ep.name, ep.status)
ep = get_inference_endpoint(<span class="hljs-string">&quot;my-endpoint-name&quot;</span>)
<span class="hljs-built_in">print</span>(ep.url)<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="advanced-custom-images" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#advanced-custom-images"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Advanced: custom images</span></h2> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> huggingface_hub <span class="hljs-keyword">import</span> create_inference_endpoint
endpoint = create_inference_endpoint(
<span class="hljs-string">&quot;custom-endpoint&quot;</span>,
repository=<span class="hljs-string">&quot;my-model&quot;</span>,
framework=<span class="hljs-string">&quot;pytorch&quot;</span>,
task=<span class="hljs-string">&quot;text-classification&quot;</span>,
accelerator=<span class="hljs-string">&quot;gpu&quot;</span>,
vendor=<span class="hljs-string">&quot;aws&quot;</span>,
region=<span class="hljs-string">&quot;us-west-2&quot;</span>,
<span class="hljs-built_in">type</span>=<span class="hljs-string">&quot;protected&quot;</span>,
instance_size=<span class="hljs-string">&quot;x1&quot;</span>,
instance_type=<span class="hljs-string">&quot;nvidia-a10g&quot;</span>,
custom_image={
<span class="hljs-string">&quot;health_route&quot;</span>: <span class="hljs-string">&quot;/health&quot;</span>,
<span class="hljs-string">&quot;env&quot;</span>: {<span class="hljs-string">&quot;MODEL_ID&quot;</span>: <span class="hljs-string">&quot;/repository&quot;</span>},
<span class="hljs-string">&quot;url&quot;</span>: <span class="hljs-string">&quot;ghcr.io/my-custom-image:latest&quot;</span>,
},
)<!-- HTML_TAG_END --></pre></div> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1xc1v36">Allow a few minutes for a new endpoint to become ready. Check status with <code>get_inference_endpoint(name)</code> and only send traffic once it’s <code>READY</code>.</p></div> <p data-svelte-h="svelte-18i7gog">Up next: build a small app that targets your endpoint and adds production tips.</p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/101-course/blob/main/chapters/en/chapter2/4.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1">&lt;</span> <span data-svelte-h="svelte-x0xyl0">&gt;</span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>
<script>
{
__sveltekit_kib1ob = {
assets: "/docs/101-course/pr_4/en",
base: "/docs/101-course/pr_4/en",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/101-course/pr_4/en/_app/immutable/entry/start.b6742992.js"),
import("/docs/101-course/pr_4/en/_app/immutable/entry/app.8b986792.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 13],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
19.4 kB
·
Xet hash:
a7ca8df9024f8022891056b2ca642b2269777bf816c6369721cf945465975c38

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.