Buckets:

rtrm's picture
download
raw
10.8 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Quick Start&quot;,&quot;local&quot;:&quot;quick-start&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Create your endpoint&quot;,&quot;local&quot;:&quot;create-your-endpoint&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Test your endpoint&quot;,&quot;local&quot;:&quot;test-your-endpoint&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}">
<link href="/docs/inference-endpoints/pr_121/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/inference-endpoints/pr_121/en/_app/immutable/entry/start.371ac37b.js">
<link rel="modulepreload" href="/docs/inference-endpoints/pr_121/en/_app/immutable/chunks/scheduler.389d799c.js">
<link rel="modulepreload" href="/docs/inference-endpoints/pr_121/en/_app/immutable/chunks/singletons.0bf2df26.js">
<link rel="modulepreload" href="/docs/inference-endpoints/pr_121/en/_app/immutable/chunks/paths.394949bb.js">
<link rel="modulepreload" href="/docs/inference-endpoints/pr_121/en/_app/immutable/entry/app.0a6d2342.js">
<link rel="modulepreload" href="/docs/inference-endpoints/pr_121/en/_app/immutable/chunks/index.8f81d18f.js">
<link rel="modulepreload" href="/docs/inference-endpoints/pr_121/en/_app/immutable/nodes/0.a7082449.js">
<link rel="modulepreload" href="/docs/inference-endpoints/pr_121/en/_app/immutable/nodes/20.649b2c77.js">
<link rel="modulepreload" href="/docs/inference-endpoints/pr_121/en/_app/immutable/chunks/getInferenceSnippets.acfad222.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Quick Start&quot;,&quot;local&quot;:&quot;quick-start&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Create your endpoint&quot;,&quot;local&quot;:&quot;create-your-endpoint&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Test your endpoint&quot;,&quot;local&quot;:&quot;test-your-endpoint&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="quick-start" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#quick-start"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Quick Start</span></h1> <p data-svelte-h="svelte-9mu07z">In this guide you’ll deploy a production ready AI model using Inference Endpoints in only a few minutes.
Make sure you’ve been able to log into the <a href>Inference Endpoints UI</a> with your Hugging Face account, and that you have a payment
method setup. If not you can add a payment method <a href="https://huggingface.co/settings/billing" rel="nofollow">from this link</a>.</p> <h2 class="relative group"><a id="create-your-endpoint" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#create-your-endpoint"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Create your endpoint</span></h2> <p data-svelte-h="svelte-1kt58pp">Start by navigating to the Inference Endpoints UI, and once you have logged in you should see a button for creating a new Inference
Endpoint, and a small greeting prompting you to create your first endpoint. Click the “New” button.</p> <p data-svelte-h="svelte-rr08y1"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/update/large-rewrite/assets/quick_start/1-new-button.png" alt="new-button"></p> <p data-svelte-h="svelte-11utja6">From there you’ll be directed to the catalog. The Model Catalog consists of popular models which have tuned configurations to work just as one-click
deploys. You can filter by name, task, price of the hardware and much more.</p> <p data-svelte-h="svelte-1fpb2w2"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/update/large-rewrite/assets/quick_start/2-catalog.png" alt="catalog"></p> <p data-svelte-h="svelte-arciu6">In this example let’s deploy the <a href="https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct" rel="nofollow">meta-llama/Llama-3.2-3B-Instruct</a> model. You can find
it by searching for <code>llama-3.2-3b</code> in the search field and deploy it by clicking the card.</p> <p data-svelte-h="svelte-1qt6gfv"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/update/large-rewrite/assets/quick_start/3-llama.png" alt="llama"></p> <p data-svelte-h="svelte-1ekjdqo">Next we’ll choose which hardware and deployment settings we’ll go for. Since this is a catalog model, all of the pre-selected options are very good
defaults. So in this case we don’t need to change anything. In case you want a deeper dive on what the different settings mean you can check out
the <a href="./guides/configuration">configuration guide</a>.</p> <p data-svelte-h="svelte-1rjf5sk">For this model the Nvidia L4 is the recommended choice. It will be perfect for our testing. Performant but still reasonably priced. Also not that by
default the endpoint will scale down to zero, meaning it will become idle after 1h of inactivity.</p> <p data-svelte-h="svelte-uhpblt">Now all you need to do is click click “Create Endpoint” 🚀</p> <p data-svelte-h="svelte-l3aysm"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/update/large-rewrite/assets/quick_start/4-config.png" alt="config"></p> <p data-svelte-h="svelte-96lco9">Now our Inference Endpoint is initializing, which usually takes about 3-5 minutes. If you want to can alow browser notifications which will give you a
ping once the endpoint reaches a running state.</p> <p data-svelte-h="svelte-1hlc9if"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/update/large-rewrite/assets/quick_start/5-init.png" alt="init"></p> <h2 class="relative group"><a id="test-your-endpoint" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#test-your-endpoint"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Test your endpoint</span></h2> <p data-svelte-h="svelte-1tao4jb">And then once everything is up and running you’ll be able to see the:</p> <ul data-svelte-h="svelte-1mzij4w"><li><strong>Endpoint URL</strong>: this is what you use to call your endpoint and send requests to it</li> <li><strong>Playground</strong>: a small visual way of quickly testing that the model works</li></ul> <p data-svelte-h="svelte-2ncdbo"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/update/large-rewrite/assets/quick_start/6-done.png" alt="done"></p> <p data-svelte-h="svelte-e53mnf">From the side of the playground you can also copy + paste a code snippet for calling the model. By clicking “App Tokens” you’ll be directed to Hugging Face
to configure an access token to be able to call the model. By default, all Inference Endpoints are created as private once which require authentication and
all data is encryped in transit using TLS/SSL.</p> <p data-svelte-h="svelte-15krezw">Congratulrations, you just deployed a production ready AI model 🔥</p> <p data-svelte-h="svelte-x8g1w1">Once you’re happy with the testing you can pause the model, delete it. Or if you let it be, it will become idle after 1 hour.</p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/hf-endpoints-documentation/blob/main/docs/source/quick_start.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1">&lt;</span> <span data-svelte-h="svelte-x0xyl0">&gt;</span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>
<script>
{
__sveltekit_1eslgpq = {
assets: "/docs/inference-endpoints/pr_121/en",
base: "/docs/inference-endpoints/pr_121/en",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/inference-endpoints/pr_121/en/_app/immutable/entry/start.371ac37b.js"),
import("/docs/inference-endpoints/pr_121/en/_app/immutable/entry/app.0a6d2342.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 20],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
10.8 kB
·
Xet hash:
639d4b41e49ada2373e5567e9f3356c927478fba7e7ac57942e30b37bcbe8476

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.