derek-thomas
/

tgi-notebooks-optimization

Model card Files Files and versions

xet

Community

derek-thomas commited on Jun 2, 2024

Commit

5d714fc

verified ·

1 Parent(s): a2cb044

Updating naming and adding polish

Browse files

Files changed (1) hide show

01-tgi-ie-benchmark.ipynb +38 -4

01-tgi-ie-benchmark.ipynb CHANGED Viewed

@@ -1,5 +1,13 @@
 {
  "cells": [
   {
    "cell_type": "code",
    "execution_count": null,
@@ -75,7 +83,9 @@
     "\n",
     "# Simulation\n",
     "RESULTS_DIR = proj_dir/'tgi_benchmark_results'/INSTANCE_TYPE\n",
-    "tgi_bss = [8, 16, 24, 32, 40, 48, 56, 64]"
    ]
   },
   {
@@ -86,6 +96,14 @@
     "# Endpoint setup"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -119,8 +137,8 @@
     "            custom_image={\n",
     "                \"health_route\": \"/health\",\n",
     "                \"env\": {\n",
-    "                    \"MAX_INPUT_LENGTH\": \"3050\",\n",
-    "                    \"MAX_TOTAL_TOKENS\": \"3300\",\n",
     "                    \"MAX_BATCH_SIZE\": f\"{MAX_BATCH_SIZE}\",\n",
     "                    \"HF_TOKEN\": get_token(),\n",
     "                    \"MODEL_ID\": \"/repository\",\n",
@@ -137,6 +155,14 @@
     "    return endpoint"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -175,7 +201,7 @@
     "    command = [\n",
     "        \"python\", benchmark_script,\n",
     "        \"--model\", f\"huggingface/{MODEL}\",\n",
-    "        \"--mean-input-tokens\", \"3000\",\n",
     "        \"--stddev-input-tokens\", \"10\",\n",
     "        \"--mean-output-tokens\", \"240\",\n",
     "        \"--stddev-output-tokens\", \"5\",\n",
@@ -210,6 +236,14 @@
     "    return max_working"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,

 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "602a8c54-b434-4d8e-bc72-824c642fbdb5",
+   "metadata": {},
+   "source": [
+    "# Setup"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
     "\n",
     "# Simulation\n",
     "RESULTS_DIR = proj_dir/'tgi_benchmark_results'/INSTANCE_TYPE\n",
+    "tgi_bss = [8, 16, 24, 32, 40, 48, 56, 64]\n",
+    "INPUT_TOKENS = 3000\n",
+    "OUTPUT_TOKENS = 300"
    ]
   },
   {
     "# Endpoint setup"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "8610e033-8586-495a-943e-539b7c8304d0",
+   "metadata": {},
+   "source": [
+    "Be sure to configure your endpoint how you desire, I made some guesses on what you might want in the `env`. You can see some settings in the [pricing section](https://huggingface.co/docs/inference-endpoints/en/pricing#gpu-instances) of the docs. I would also recommend manually deploying once and using  `get_inference_endpoint().__dict__` to double check your settings just to double check."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
     "            custom_image={\n",
     "                \"health_route\": \"/health\",\n",
     "                \"env\": {\n",
+    "                    \"MAX_INPUT_LENGTH\": f\"{INPUT_TOKENS+50}\",\n",
+    "                    \"MAX_TOTAL_TOKENS\": f\"{INPUT_TOKENS + OUTPUT_TOKENS}\",\n",
     "                    \"MAX_BATCH_SIZE\": f\"{MAX_BATCH_SIZE}\",\n",
     "                    \"HF_TOKEN\": get_token(),\n",
     "                    \"MODEL_ID\": \"/repository\",\n",
     "    return endpoint"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "5e55710d-fa77-41b7-ae9c-a4826140f6b6",
+   "metadata": {},
+   "source": [
+    "Make sure to check the command to make sure it matches what you expect. Also check the summary stats json to see what actually happened."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
     "    command = [\n",
     "        \"python\", benchmark_script,\n",
     "        \"--model\", f\"huggingface/{MODEL}\",\n",
+    "        \"--mean-input-tokens\", f\"{INPUT_TOKENS}\",\n",
     "        \"--stddev-input-tokens\", \"10\",\n",
     "        \"--mean-output-tokens\", \"240\",\n",
     "        \"--stddev-output-tokens\", \"5\",\n",
     "    return max_working"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "d32b71a7-371f-4f80-a9f2-2cfc65e04afd",
+   "metadata": {},
+   "source": [
+    "Here Im creating the endpoint and then running the simulation."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,