Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Pricing","local":"pricing","sections":[{"title":"CPU Instances","local":"cpu-instances","sections":[],"depth":2},{"title":"GPU Instances","local":"gpu-instances","sections":[],"depth":2},{"title":"Accelerator Instances","local":"accelerator-instances","sections":[],"depth":2},{"title":"Pricing examples","local":"pricing-examples","sections":[{"title":"Basic Endpoint","local":"basic-endpoint","sections":[],"depth":3},{"title":"Advanced Endpoint","local":"advanced-endpoint","sections":[],"depth":3}],"depth":2},{"title":"Quotas","local":"quotas","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/inference-endpoints/pr_113/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/entry/start.d1c14968.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/scheduler.389d799c.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/singletons.16c9b508.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/paths.58d119e0.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/entry/app.18050d92.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/index.8f81d18f.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/nodes/0.ce016c16.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/nodes/24.74025787.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/CodeBlock.c0898180.js"> | |
| <link rel="modulepreload" href="/docs/inference-endpoints/pr_113/en/_app/immutable/chunks/getInferenceSnippets.8efa8e08.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Pricing","local":"pricing","sections":[{"title":"CPU Instances","local":"cpu-instances","sections":[],"depth":2},{"title":"GPU Instances","local":"gpu-instances","sections":[],"depth":2},{"title":"Accelerator Instances","local":"accelerator-instances","sections":[],"depth":2},{"title":"Pricing examples","local":"pricing-examples","sections":[{"title":"Basic Endpoint","local":"basic-endpoint","sections":[],"depth":3},{"title":"Advanced Endpoint","local":"advanced-endpoint","sections":[],"depth":3}],"depth":2},{"title":"Quotas","local":"quotas","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="pricing" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#pricing"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Pricing</span></h1> <div class="flex md:justify-start mb-2 text-gray-400 items-center" data-svelte-h="svelte-iooz66"><a href="https://ui.endpoints.huggingface.co/new"><button class="shadow-sm bg-white bg-gradient-to-br from-gray-100/20 to-gray-200/60 hover:to-gray-100/70 text-gray-700 py-1.5 rounded-lg ring-1 ring-gray-300/60 hover:ring-gray-300/30 font-semibold active:shadow-inner px-5">Deploy a model</button></a> <span class="mx-4 ">Or</span> <a href="mailto:api-enterprise@huggingface.co" class="underline">Request a quote</a></div> <p data-svelte-h="svelte-1vcpi9g">Easily deploy machine learning models on dedicated infrastructure with 🤗 Inference Endpoints. When you create an Endpoint, you can select the instance type to deploy and scale your model according to an hourly rate. 🤗 Inference Endpoints is accessible to Hugging Face accounts with an active subscription and credit card on file. At the end of the subscription period, the user or organization account will be charged for the compute resources used while successfully deployed Endpoints (ready to serve) are <em>initializing</em> and in a <em>running</em> state.</p> <p data-svelte-h="svelte-qusxpr">You can find the hourly pricing for all available instances for 🤗 Inference Endpoints, and examples of how costs are calculated below. While the prices are shown by the hour, the actual cost is calculated by the minute.</p> <h2 class="relative group"><a id="cpu-instances" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#cpu-instances"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>CPU Instances</span></h2> <p data-svelte-h="svelte-jy5i97">The table below shows currently available CPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to <a href="mailto:api-enterprise@huggingface.co?subject=Quota%20increase%20HF%20Endpoints&body=Hello,%0D%0A%0D%0AI%20would%20like%20to%20request%20access/quota%20increase%20for%20%5BINSTANCE%20TYPE%5D%20for%20the%20following%20account%20%5BHF%20ACCOUNT%5D.">request a quota</a> to use it.</p> <table data-svelte-h="svelte-x41c4r"><thead><tr><th>Provider</th> <th>Instance Type</th> <th>Instance Size</th> <th>Hourly rate</th> <th>vCPUs</th> <th>Memory</th> <th>Architecture</th></tr></thead> <tbody><tr><td>aws</td> <td>intel-icl</td> <td>x1</td> <td>$0.032</td> <td>1</td> <td>2 GB</td> <td>Intel Ice Lake <em>(soon to be fully deprecated)</em></td></tr> <tr><td>aws</td> <td>intel-icl</td> <td>x2</td> <td>$0.064</td> <td>2</td> <td>4 GB</td> <td>Intel Ice Lake <em>(soon to be fully deprecated)</em></td></tr> <tr><td>aws</td> <td>intel-icl</td> <td>x4</td> <td>$0.128</td> <td>4</td> <td>8 GB</td> <td>Intel Ice Lake <em>(soon to be fully deprecated)</em></td></tr> <tr><td>aws</td> <td>intel-icl</td> <td>x8</td> <td>$0.256</td> <td>8</td> <td>16 GB</td> <td>Intel Ice Lake <em>(soon to be fully deprecated)</em></td></tr> <tr><td>aws</td> <td>intel-spr</td> <td>x1</td> <td>$0.033</td> <td>1</td> <td>2 GB</td> <td>Intel Sapphire Rapids</td></tr> <tr><td>aws</td> <td>intel-spr</td> <td>x2</td> <td>$0.067</td> <td>2</td> <td>4 GB</td> <td>Intel Sapphire Rapids</td></tr> <tr><td>aws</td> <td>intel-spr</td> <td>x4</td> <td>$0.134</td> <td>4</td> <td>8 GB</td> <td>Intel Sapphire Rapids</td></tr> <tr><td>aws</td> <td>intel-spr</td> <td>x8</td> <td>$0.268</td> <td>8</td> <td>16 GB</td> <td>Intel Sapphire Rapids</td></tr> <tr><td>aws</td> <td>intel-spr</td> <td>x16</td> <td>$0.536</td> <td>16</td> <td>32 GB</td> <td>Intel Sapphire Rapids</td></tr> <tr><td>azure</td> <td>intel-xeon</td> <td>x1</td> <td>$0.060</td> <td>1</td> <td>2 GB</td> <td>Intel Xeon</td></tr> <tr><td>azure</td> <td>intel-xeon</td> <td>x2</td> <td>$0.120</td> <td>2</td> <td>4 GB</td> <td>Intel Xeon</td></tr> <tr><td>azure</td> <td>intel-xeon</td> <td>x4</td> <td>$0.240</td> <td>4</td> <td>8 GB</td> <td>Intel Xeon</td></tr> <tr><td>azure</td> <td>intel-xeon</td> <td>x8</td> <td>$0.480</td> <td>8</td> <td>16 GB</td> <td>Intel Xeon</td></tr> <tr><td>gcp</td> <td>intel-spr</td> <td>x1</td> <td>$0.050</td> <td>1</td> <td>2 GB</td> <td>Intel Sapphire Rapids</td></tr> <tr><td>gcp</td> <td>intel-spr</td> <td>x2</td> <td>$0.100</td> <td>2</td> <td>4 GB</td> <td>Intel Sapphire Rapids</td></tr> <tr><td>gcp</td> <td>intel-spr</td> <td>x4</td> <td>$0.200</td> <td>4</td> <td>8 GB</td> <td>Intel Sapphire Rapids</td></tr> <tr><td>gcp</td> <td>intel-spr</td> <td>x8</td> <td>$0.400</td> <td>8</td> <td>16 GB</td> <td>Intel Sapphire Rapids</td></tr></tbody></table> <h2 class="relative group"><a id="gpu-instances" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#gpu-instances"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>GPU Instances</span></h2> <p data-svelte-h="svelte-1lsrg9j">The table below shows currently available GPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to <a href="mailto:api-enterprise@huggingface.co?subject=Quota%20increase%20HF%20Endpoints&body=Hello,%0D%0A%0D%0AI%20would%20like%20to%20request%20access/quota%20increase%20for%20%5BINSTANCE%20TYPE%5D%20for%20the%20following%20account%20%5BHF%20ACCOUNT%5D.">request a quota</a> to use it.</p> <table data-svelte-h="svelte-boil61"><thead><tr><th>Provider</th> <th>Instance Type</th> <th>Instance Size</th> <th>Hourly rate</th> <th>GPUs</th> <th>Memory</th> <th>Architecture</th></tr></thead> <tbody><tr><td>aws</td> <td>nvidia-t4</td> <td>x1</td> <td>$0.5</td> <td>1</td> <td>14 GB</td> <td>NVIDIA T4</td></tr> <tr><td>aws</td> <td>nvidia-t4</td> <td>x4</td> <td>$3</td> <td>4</td> <td>56 GB</td> <td>NVIDIA T4</td></tr> <tr><td>aws</td> <td>nvidia-l4</td> <td>x1</td> <td>$0.8</td> <td>1</td> <td>24 GB</td> <td>NVIDIA L4</td></tr> <tr><td>aws</td> <td>nvidia-l4</td> <td>x4</td> <td>$3.8</td> <td>4</td> <td>96 GB</td> <td>NVIDIA L4</td></tr> <tr><td>aws</td> <td>nvidia-a10g</td> <td>x1</td> <td>$1</td> <td>1</td> <td>24 GB</td> <td>NVIDIA A10G</td></tr> <tr><td>aws</td> <td>nvidia-a10g</td> <td>x4</td> <td>$5</td> <td>4</td> <td>96 GB</td> <td>NVIDIA A10G</td></tr> <tr><td>aws</td> <td>nvidia-l40s</td> <td>x1</td> <td>$1.8</td> <td>1</td> <td>48 GB</td> <td>NVIDIA L40S</td></tr> <tr><td>aws</td> <td>nvidia-l40s</td> <td>x4</td> <td>$8.3</td> <td>4</td> <td>192 GB</td> <td>NVIDIA L40S</td></tr> <tr><td>aws</td> <td>nvidia-l40s</td> <td>x8</td> <td>$23.5</td> <td>8</td> <td>384 GB</td> <td>NVIDIA L40S</td></tr> <tr><td>aws</td> <td>nvidia-a100</td> <td>x1</td> <td>$4</td> <td>1</td> <td>80 GB</td> <td>NVIDIA A100</td></tr> <tr><td>aws</td> <td>nvidia-a100</td> <td>x2</td> <td>$8</td> <td>2</td> <td>160 GB</td> <td>NVIDIA A100</td></tr> <tr><td>aws</td> <td>nvidia-a100</td> <td>x4</td> <td>$16</td> <td>4</td> <td>320 GB</td> <td>NVIDIA A100</td></tr> <tr><td>aws</td> <td>nvidia-a100</td> <td>x8</td> <td>$32</td> <td>8</td> <td>640 GB</td> <td>NVIDIA A100</td></tr> <tr><td>aws</td> <td>nvidia-h200</td> <td>x1</td> <td>$5</td> <td>1</td> <td>141 GB</td> <td>NVIDIA H200</td></tr> <tr><td>aws</td> <td>nvidia-h200</td> <td>x2</td> <td>$10</td> <td>2</td> <td>282 GB</td> <td>NVIDIA H200</td></tr> <tr><td>aws</td> <td>nvidia-h200</td> <td>x4</td> <td>$20</td> <td>4</td> <td>564 GB</td> <td>NVIDIA H200</td></tr> <tr><td>aws</td> <td>nvidia-h200</td> <td>x8</td> <td>$40</td> <td>8</td> <td>1128 GB</td> <td>NVIDIA H200</td></tr> <tr><td>gcp</td> <td>nvidia-t4</td> <td>x1</td> <td>$0.5</td> <td>1</td> <td>16 GB</td> <td>NVIDIA T4</td></tr> <tr><td>gcp</td> <td>nvidia-l4</td> <td>x1</td> <td>$0.7</td> <td>1</td> <td>24 GB</td> <td>NVIDIA L4</td></tr> <tr><td>gcp</td> <td>nvidia-l4</td> <td>x4</td> <td>$3.8</td> <td>4</td> <td>96 GB</td> <td>NVIDIA L4</td></tr> <tr><td>gcp</td> <td>nvidia-a100</td> <td>x1</td> <td>$3.6</td> <td>1</td> <td>80 GB</td> <td>NVIDIA A100</td></tr> <tr><td>gcp</td> <td>nvidia-a100</td> <td>x2</td> <td>$7.2</td> <td>2</td> <td>160 GB</td> <td>NVIDIA A100</td></tr> <tr><td>gcp</td> <td>nvidia-a100</td> <td>x4</td> <td>$14.4</td> <td>4</td> <td>320 GB</td> <td>NVIDIA A100</td></tr> <tr><td>gcp</td> <td>nvidia-a100</td> <td>x8</td> <td>$28.8</td> <td>8</td> <td>640 GB</td> <td>NVIDIA A100</td></tr> <tr><td>gcp</td> <td>nvidia-h100</td> <td>x1</td> <td>$10</td> <td>1</td> <td>80 GB</td> <td>NVIDIA H100</td></tr> <tr><td>gcp</td> <td>nvidia-h100</td> <td>x2</td> <td>$20</td> <td>2</td> <td>160 GB</td> <td>NVIDIA H100</td></tr> <tr><td>gcp</td> <td>nvidia-h100</td> <td>x4</td> <td>$40</td> <td>4</td> <td>320 GB</td> <td>NVIDIA H100</td></tr> <tr><td>gcp</td> <td>nvidia-h100</td> <td>x8</td> <td>$80</td> <td>8</td> <td>640 GB</td> <td>NVIDIA H100</td></tr></tbody></table> <h2 class="relative group"><a id="accelerator-instances" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#accelerator-instances"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Accelerator Instances</span></h2> <p data-svelte-h="svelte-1frzpk">The table below shows currently available custom Accelerators instances and their hourly pricing. If the instance type cannot be selected in the application, you need to <a href="mailto:api-enterprise@huggingface.co?subject=Quota%20increase%20HF%20Endpoints&body=Hello,%0D%0A%0D%0AI%20would%20like%20to%20request%20access/quota%20increase%20for%20%5BINSTANCE%20TYPE%5D%20for%20the%20following%20account%20%5BHF%20ACCOUNT%5D.">request a quota</a> to use it.</p> <table data-svelte-h="svelte-1pb3jbn"><thead><tr><th>Provider</th> <th>Instance Type</th> <th>Instance Size</th> <th>Hourly rate</th> <th>Accelerators</th> <th>Accelerator Memory</th> <th>RAM</th> <th>Architecture</th></tr></thead> <tbody><tr><td>aws</td> <td>inf2</td> <td>x1</td> <td>$0.75</td> <td>1</td> <td>32 GB</td> <td>14.5 GB</td> <td>AWS Inferentia2</td></tr> <tr><td>aws</td> <td>inf2</td> <td>x12</td> <td>$12</td> <td>12</td> <td>384 GB</td> <td>760 GB</td> <td>AWS Inferentia2</td></tr> <tr><td>gcp</td> <td>tpu</td> <td>1x1</td> <td>$1.2</td> <td>1</td> <td>16 GB</td> <td>44 GB</td> <td>Google TPU v5e</td></tr> <tr><td>gcp</td> <td>tpu</td> <td>2x2</td> <td>$4.75</td> <td>4</td> <td>64 GB</td> <td>186 GB</td> <td>Google TPU v5e</td></tr> <tr><td>gcp</td> <td>tpu</td> <td>2x4</td> <td>$9.5</td> <td>8</td> <td>128 GB</td> <td>380 GB</td> <td>Google TPU v5e</td></tr></tbody></table> <h2 class="relative group"><a id="pricing-examples" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#pricing-examples"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Pricing examples</span></h2> <p data-svelte-h="svelte-1rore2d">The following example pricing scenarios demonstrate how costs are calculated. You can find the hourly rate for all instance types and sizes in the tables above. Use the following formula to calculate the costs:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->instance hourly rate * ((hours * # min <span class="hljs-keyword">replica</span>) + (scale-up hrs * # additional replicas))<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="basic-endpoint" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#basic-endpoint"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Basic Endpoint</span></h3> <ul data-svelte-h="svelte-dy76s5"><li>AWS CPU intel-spr x2 (2x vCPUs 4GB RAM)</li> <li>Autoscaling (minimum 1 replica, maximum 1 replica)</li></ul> <p data-svelte-h="svelte-168lxnz"><strong>hourly cost</strong></p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->instance hourly rate * (hours * # min <span class="hljs-keyword">replica</span>) = hourly <span class="hljs-keyword">cost</span> | |
| <span class="hljs-meta">$0</span><span class="hljs-number">.067</span>/hr * (<span class="hljs-number">1</span>hr * <span class="hljs-number">1</span> <span class="hljs-keyword">replica</span>) = <span class="hljs-meta">$0</span><span class="hljs-number">.067</span>/hr<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1fgpenz"><strong>monthly cost</strong></p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->instance hourly rate * (hours * # min <span class="hljs-keyword">replica</span>) = monthly <span class="hljs-keyword">cost</span> | |
| <span class="hljs-meta">$0</span><span class="hljs-number">.064</span>/hr * (<span class="hljs-number">730</span>hr * <span class="hljs-number">1</span> <span class="hljs-keyword">replica</span>) = <span class="hljs-meta">$46</span><span class="hljs-number">.72</span>/month<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1i76ptz"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/basic-chart.png" alt="basic-chart"></p> <h3 class="relative group"><a id="advanced-endpoint" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#advanced-endpoint"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Advanced Endpoint</span></h3> <ul data-svelte-h="svelte-1nfe6co"><li>AWS GPU small (1x GPU 14GB RAM)</li> <li>Autoscaling (minimum 1 replica, maximum 3 replica), every hour a spike in traffic scales the Endpoint from 1 to 3 replicas for 15 minutes</li></ul> <p data-svelte-h="svelte-168lxnz"><strong>hourly cost</strong></p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->instance hourly rate * ((hours * # min <span class="hljs-keyword">replica</span>) + (scale-up hrs * # additional replicas)) = hourly <span class="hljs-keyword">cost</span> | |
| <span class="hljs-meta">$0</span><span class="hljs-number">.5</span>/hr * ((<span class="hljs-number">1</span>hr * <span class="hljs-number">1</span> <span class="hljs-keyword">replica</span>) + (<span class="hljs-number">0.25</span>hr * <span class="hljs-number">2</span> replicas)) = <span class="hljs-meta">$0</span><span class="hljs-number">.75</span>/hr<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1fgpenz"><strong>monthly cost</strong></p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->instance hourly rate * ((hours * # min <span class="hljs-keyword">replica</span>) + (scale-up hrs * # additional replicas)) = monthly <span class="hljs-keyword">cost</span> | |
| <span class="hljs-meta">$0</span><span class="hljs-number">.5</span>/hr * ((<span class="hljs-number">730</span>hr * <span class="hljs-number">1</span> <span class="hljs-keyword">replica</span>) + (<span class="hljs-number">182.5</span>hr * <span class="hljs-number">2</span> replicas)) = <span class="hljs-meta">$547</span><span class="hljs-number">.5</span>/month<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-fxjvef"><img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/advanced-chart.png" alt="advanced-chart"></p> <h2 class="relative group"><a id="quotas" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#quotas"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Quotas</span></h2> <p data-svelte-h="svelte-bpxflc">Listed available quota can now be seen in the Inference dashboard at <a href="https://ui.endpoints.huggingface.co" rel="nofollow">https://ui.endpoints.huggingface.co</a> under “Quotas Used”.</p> <p data-svelte-h="svelte-aq8udi">The number displayed will reference the number of instances used / available instance quota. <em>Paused</em> endpoints will not count against “used” quota. <em>Scaled to Zero</em> endpoints will be counted as “used” quota - simply pause the scaled-to-zero endpoint should you like to unlock this quota.</p> <p data-svelte-h="svelte-1rs723z">Please contact us if you’d like to increase quota allocations. PRO users and Enterprise Hub organizations will have access to higher quota amounts when requested.</p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/hf-endpoints-documentation/blob/main/docs/source/pricing.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_87vzq7 = { | |
| assets: "/docs/inference-endpoints/pr_113/en", | |
| base: "/docs/inference-endpoints/pr_113/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/inference-endpoints/pr_113/en/_app/immutable/entry/start.d1c14968.js"), | |
| import("/docs/inference-endpoints/pr_113/en/_app/immutable/entry/app.18050d92.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 24], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 32.9 kB
- Xet hash:
- 18fde0fd2dbfb80984b8e2ebda564da60749789aa02e488add82c2ddc1f01ccb
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.