Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Monitoring TGI server with Prometheus and Grafana dashboard","local":"monitoring-tgi-server-with-prometheus-and-grafana-dashboard","sections":[{"title":"Setup on the server machine","local":"setup-on-the-server-machine","sections":[],"depth":2},{"title":"Setup on the monitoring machine","local":"setup-on-the-monitoring-machine","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/text-generation-inference/main/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/entry/start.1810066f.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/chunks/scheduler.362310b7.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/chunks/singletons.fa2b0eb7.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/chunks/index.7f53ec41.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/chunks/paths.284aef40.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/entry/app.8cfc1931.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/chunks/index.57dfc70d.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/nodes/0.543c9bd9.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/nodes/5.7ef302f5.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/chunks/CodeBlock.d3c47f83.js"> | |
| <link rel="modulepreload" href="/docs/text-generation-inference/main/en/_app/immutable/chunks/EditOnGithub.9633c464.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Monitoring TGI server with Prometheus and Grafana dashboard","local":"monitoring-tgi-server-with-prometheus-and-grafana-dashboard","sections":[{"title":"Setup on the server machine","local":"setup-on-the-server-machine","sections":[],"depth":2},{"title":"Setup on the monitoring machine","local":"setup-on-the-monitoring-machine","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="monitoring-tgi-server-with-prometheus-and-grafana-dashboard" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#monitoring-tgi-server-with-prometheus-and-grafana-dashboard"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Monitoring TGI server with Prometheus and Grafana dashboard</span></h1> <p data-svelte-h="svelte-juninn">TGI server deployment can easily be monitored through a Grafana dashboard, consuming a Prometheus data collection. Example of inspectable metrics are statistics on the effective batch sizes used by TGI, prefill/decode latencies, number of generated tokens, etc.</p> <p data-svelte-h="svelte-uocx0h">In this tutorial, we look at how to set up a local Grafana dashboard to monitor TGI usage.</p> <p data-svelte-h="svelte-r5id"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/grafana.png" alt="Grafana dashboard for TGI"></p> <h2 class="relative group"><a id="setup-on-the-server-machine" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#setup-on-the-server-machine"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Setup on the server machine</span></h2> <p data-svelte-h="svelte-u7ohf2">First, on your server machine, TGI needs to be launched as usual. TGI exposes <a href="https://github.com/huggingface/text-generation-inference/discussions/1127#discussioncomment-7240527" rel="nofollow">multiple</a> metrics that can be collected by Prometheus monitoring server.</p> <p data-svelte-h="svelte-eolpdn">In the rest of this tutorial, we assume that TGI was launched through Docker with <code>--network host</code>.</p> <p data-svelte-h="svelte-1gus98u">On the server where TGI is hosted, a Prometheus server needs to be installed and launched. To do so, please follow <a href="https://prometheus.io/download/#prometheus" rel="nofollow">Prometheus installation instructions</a>. For example, at the time of writing on a Linux machine:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-attribute">wget</span> https://github.com/prometheus/prometheus/releases/download/v2.<span class="hljs-number">52</span>.<span class="hljs-number">0</span>/prometheus-<span class="hljs-number">2</span>.<span class="hljs-number">52</span>.<span class="hljs-number">0</span>.linux-amd64.tar.gz | |
| <span class="hljs-attribute">tar</span> -xvzf prometheus-<span class="hljs-number">2</span>.<span class="hljs-number">52</span>.<span class="hljs-number">0</span>.linux-amd64.tar.gz | |
| <span class="hljs-attribute">cd</span> prometheus<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-vtwna1">Prometheus needs to be configured to listen on TGI’s port. To do so, in Prometheus configuration file <code>prometheus.yml</code>, one needs to edit the lines:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-symbol"> static_configs:</span> | |
| - targets: [<span class="hljs-string">"0.0.0.0:80"</span>]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-osr7qy">to use the correct IP address and port.</p> <p data-svelte-h="svelte-jejcjs">We suggest to try <code>curl 0.0.0.0:80/generate -X POST -d '{"inputs":"hey chatbot, how are","parameters":{"max_new_tokens":15}}' -H 'Content-Type: application/json'</code> on the server side to make sure to configure the correct IP and port.</p> <p data-svelte-h="svelte-dg4f0p">Once Prometheus is configured, Prometheus server can be launched on the same machine where TGI is launched:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-string">./prometheus</span> <span class="hljs-params">--config</span>.file=<span class="hljs-string">"prometheus.yml"</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1bv98g2">In this guide, Prometheus monitoring data will be consumed on a local computer. Hence, we need to forward Prometheus port (by default 9090) to the local computer. To do so, we can for example:</p> <ul data-svelte-h="svelte-lppr6x"><li>Use ssh <a href="https://www.ssh.com/academy/ssh/tunneling-example" rel="nofollow">local port forwarding</a></li> <li>Use ngrok port tunneling</li></ul> <p data-svelte-h="svelte-1s8npb5">For simplicity, we will use <a href="https://ngrok.com/docs/" rel="nofollow">Ngrok</a> in this guide to tunnel Prometheus port from the TGI server to the outside word.</p> <p data-svelte-h="svelte-z8c33r">For that, you should follow the steps at <a href="https://dashboard.ngrok.com/get-started/setup/linux" rel="nofollow">https://dashboard.ngrok.com/get-started/setup/linux</a>, and once Ngrok is installed, use:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->ngrok http http://0.0.0.0:9090<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1wlwaap">As a sanity check, one can make sure that Prometheus server can be accessed at the URL given by Ngrok (in the style of <a href="https://d661-4-223-164-145.ngrok-free.app" rel="nofollow">https://d661-4-223-164-145.ngrok-free.app</a>) from a local machine.</p> <h2 class="relative group"><a id="setup-on-the-monitoring-machine" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#setup-on-the-monitoring-machine"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Setup on the monitoring machine</span></h2> <p data-svelte-h="svelte-1f0vfd2">Monitoring is typically done on an other machine than the server one. We use a Grafana dashboard to monitor TGI’s server usage.</p> <p data-svelte-h="svelte-1nvmzr">Two options are available:</p> <ul data-svelte-h="svelte-1fna2fo"><li>Use Grafana Cloud for an hosted dashboard solution (<a href="https://grafana.com/products/cloud/" rel="nofollow">https://grafana.com/products/cloud/</a>).</li> <li>Self-host a grafana dashboard.</li></ul> <p data-svelte-h="svelte-nfxilf">In this tutorial, for simplicity, we will self host the dashbard. We recommend installing Grafana Open-source edition following <a href="https://grafana.com/grafana/download?platform=linux&edition=oss" rel="nofollow">the official install instructions</a>, using the available Linux binaries. For example:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz | |
| tar -zxvf grafana-11.0.0.linux-amd64.tar.gz | |
| <span class="hljs-built_in">cd</span> grafana-11.0.0 | |
| ./bin/grafana-server<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-104f7i7">Once the Grafana server is launched, the Grafana interface is available at http://localhost:3000. One needs to log in with the <code>admin</code> username and <code>admin</code> password.</p> <p data-svelte-h="svelte-1ptchid">Once logged in, the Prometheus data source for Grafana needs to be configured, in the option <code>Add your first data source</code>. There, a Prometheus data source needs to be added with the Ngrok address we got earlier, that exposes Prometheus port (example: <a href="https://d661-4-223-164-145.ngrok-free.app" rel="nofollow">https://d661-4-223-164-145.ngrok-free.app</a>).</p> <p data-svelte-h="svelte-28lk0w">Once Prometheus data source is configured, we can finally create our dashboard! From home, go to <code>Create your first dashboard</code> and then <code>Import dashboard</code>. There, we will use the recommended dashboard template <a href="https://github.com/huggingface/text-generation-inference/blob/main/assets/tgi_grafana.json" rel="nofollow">tgi_grafana.json</a> for a dashboard ready to be used, but you may configure your own dashboard as you like.</p> <p data-svelte-h="svelte-qf2q1i">Community contributed dashboard templates are also available, for example <a href="https://grafana.com/grafana/dashboards/19831-text-generation-inference-dashboard/" rel="nofollow">here</a> or <a href="https://grafana.com/grafana/dashboards/20246-text-generation-inference/" rel="nofollow">here</a>.</p> <p data-svelte-h="svelte-5ohuzq">Load your dashboard configuration, and your TGI dashboard should be ready to go!</p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/text-generation-inference/blob/main/docs/source/basic_tutorials/monitoring.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1dfb6m4 = { | |
| assets: "/docs/text-generation-inference/main/en", | |
| base: "/docs/text-generation-inference/main/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/text-generation-inference/main/en/_app/immutable/entry/start.1810066f.js"), | |
| import("/docs/text-generation-inference/main/en/_app/immutable/entry/app.8cfc1931.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 5], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 20.2 kB
- Xet hash:
- 5ea8b757ded109fe1efd72d481804f392b9f1da77752ed5ebe2b3cde0c10be64
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.