Philip Kehl
feat: track queueing time, inference time, and total latency separately. the timestamp for start inference should be taken somewhere else
ebf4772
| <html lang="en"> | |
| <head> | |
| <meta charset="utf-8"/> | |
| <meta name="viewport" content="width=device-width, initial-scale=1"/> | |
| <title>Browser LLM Evaluation</title> | |
| <script src="https://cdn.tailwindcss.com"></script> | |
| </head> | |
| <body class="bg-gray-100 text-gray-900 min-h-screen"> | |
| <main class="max-w-6xl mx-auto p-6"> | |
| <h1 class="text-3xl font-bold mb-6 text-center">Browser LLM Evaluation</h1> | |
| <p class="mb-6 text-gray-700 text-center"> | |
| This project explores how in-browser LLM inference behafes compared to cloud-based inference in terms of | |
| latency. | |
| The goal is to modle different request incoming patterns and routing strategies between cloud and on-device | |
| models. | |
| For the prompts and evaluation of the accuracy the BooIQ dataset is used. | |
| The project is currently under development and does not aim to provide accurate LLM responses, but rather to | |
| measure performance differences. | |
| To run the cloud based inference, you need to bring your own OpenRouter API key. | |
| </p> | |
| <section class="grid grid-cols-1 md:grid-cols-3 gap-6"> | |
| <!-- Cloud Card --> | |
| <div class="bg-white p-6 rounded-2xl shadow-xl border border-gray-200"> | |
| <h2 class="text-xl font-semibold mb-4">Cloud (OpenRouter)</h2> | |
| <label class="block mb-4 text-sm font-medium">Model | |
| <select id="cloudModel" | |
| class="w-full mb-4 px-3 py-2 rounded-lg border border-gray-300 focus:ring-2 focus:ring-blue-500 focus:outline-none"> | |
| <option value="openai/gpt-4o-mini">openai/gpt-4o-mini</option> | |
| <option value="meta-llama/llama-3.2-1b-instruct">meta-llama/llama-3.2-1b-instruct</option> | |
| <option value="google/gemma-3n-e2b-it:free">google/gemma-3n-e2b-it:free</option> | |
| <option value="meta-llama/llama-3.2-1b-instruct">meta-llama/llama-3.2-1b-instruct</option> | |
| <option value="meta-llama/llama-3.2-3b-instruct">meta-llama/llama-3.2-3b-instruct</option> | |
| </select> | |
| </label> | |
| <label class="block mb-4 text-sm font-medium">API Key OpenRouter | |
| <input id="cloudApiKey" type="text" placeholder="Key..." | |
| class="mt-1 w-full px-3 py-2 rounded-lg border border-gray-300 focus:ring-2 focus:ring-blue-500 focus:outline-none"/> | |
| </label> | |
| </div> | |
| <!-- On-Device Card --> | |
| <div class="bg-white p-6 rounded-2xl shadow-xl border border-gray-200"> | |
| <h2 class="text-xl font-semibold mb-4">On-Device</h2> | |
| <label class="block text-sm font-medium">Model (transformers.js) | |
| <select id="deviceModel" | |
| class="w-full mb-2 px-3 py-2 rounded-lg border border-gray-300 focus:ring-2 focus:ring-blue-500 focus:outline-none"> | |
| <option value="onnx-community/gemma-3-270m-it-ONNX">gemma-3-270m-it-ONNX</option> | |
| <option value="onnx-community/gemma-3-1b-it-ONNX">gemma-3-1b-it-ONNX</option> | |
| <option value="onnx-community/Llama-3.2-1B-Instruct-ONNX">Llama-3.2-1B-Instruct-ONNX</option> | |
| <option disabled value="onnx-community/Llama-3.2-3B-Instruct-ONNX">Llama-3.2-3B-Instruct-ONNX (not | |
| working) | |
| </option> | |
| <option disabled value="onnx-community/gemma-3n-E2B-it-ONNX">gemma-3n-E2B-it-ONNX (not working) | |
| </option> | |
| </select> | |
| </label> | |
| <button id="loadDeviceModelBtn" | |
| class="mt-4 w-full bg-blue-600 text-white py-2 rounded-lg hover:bg-blue-700 transition">Load Model | |
| </button> | |
| <div id="deviceStatus" class="text-gray-700 text-sm my-4">Not loaded</div> | |
| <div id="deviceLoadingContainer" class="w-full max-w-xs my-2"> | |
| <div id="deviceLoadingBar" class="h-2 bg-green-500 transition-all duration-200 w-0"></div> | |
| <span id="deviceLoadingText" class="text-xs text-gray-600"></span> | |
| </div> | |
| </div> | |
| <!-- Request Pattern Card --> | |
| <div class="bg-white p-6 rounded-2xl shadow-xl border border-gray-200"> | |
| <h2 class="text-xl font-semibold mb-4">Request Pattern & Routing</h2> | |
| <label class="block mb-4 text-sm font-medium">Load Pattern | |
| <select id="patternSelect" | |
| class="mt-1 w-full px-3 py-2 rounded-lg border border-gray-300 focus:ring-2 focus:ring-blue-500 focus:outline-none"> | |
| <option value="once-per-sec">1 request / sec</option> | |
| <option value="every-ten-sec">Every 10 sec 1 request</option> | |
| <option value="exponential-arrival">Exponential arrival time</option> | |
| </select> | |
| </label> | |
| <label class="block mb-4 text-sm font-medium">Route strategy | |
| <select id="routeStrategy" | |
| class="mt-1 w-full px-3 py-2 rounded-lg border border-gray-300 focus:ring-2 focus:ring-blue-500 focus:outline-none"> | |
| <option value="always_device">Always device</option> | |
| <option value="always_cloud">Always cloud</option> | |
| <option value="roundrobin">Round Robin</option> | |
| <option value="probabilistic">Probabilistic (p to cloud)</option> | |
| </select> | |
| </label> | |
| <label class="block mb-4 text-sm font-medium">Cloud probability (for probabilistic) | |
| <input id="cloudProb" type="number" min="0" max="1" step="0.1" value="0.5" | |
| class="mt-1 w-full px-3 py-2 rounded-lg border border-gray-300 focus:ring-2 focus:ring-blue-500 focus:outline-none"/> | |
| </label> | |
| <label class="block mb-4 text-sm font-medium">Interarrival Time Lambda (for exponential arrival) | |
| <input id="interArrivalTimeLambda" type="number" min="0" step="0.1" value="2" class="mt-1 w-full px-3 py-2 rounded-lg border border-gray-300 focus:ring-2 focus:ring-blue-500 focus:outline-none"/> | |
| </label> | |
| <div class="flex gap-3 mt-4"> | |
| <button id="startBtn" | |
| class="flex-1 bg-green-600 text-white py-2 rounded-lg hover:bg-green-700 transition">Start | |
| </button> | |
| <button id="stopBtn" disabled class="flex-1 bg-gray-400 text-white py-2 rounded-lg">Stop</button> | |
| </div> | |
| </div> | |
| <!-- Log Card --> | |
| <div class="bg-white p-6 rounded-2xl shadow-xl border border-gray-200 md:col-span-3"> | |
| <h2 class="text-xl font-semibold mb-4">Live Log & Results</h2> | |
| <div class="block w-full h-64 overflow-scroll"> | |
| <table id="log" | |
| class="w-full h-64 overflow-scroll bg-gray-50 p-3 rounded-lg border border-gray-200 text-sm"> | |
| <thead> | |
| <tr> | |
| <th class="text-left">Timestamp</th> | |
| <th class="text-left">Route</th> | |
| <th class="text-left">Total Latency (ms)</th> | |
| <th class="text-left">Queue (ms)</th> | |
| <th class="text-left">Inference (ms)</th> | |
| <th class="text-left">Question</th> | |
| <th class="text-left">Answer</th> | |
| <th class="text-left">Correct</th> | |
| </tr> | |
| </thead> | |
| <tbody id="log-table-body"></tbody> | |
| </table> | |
| </div> | |
| <div id="stats" class="mt-4 text-sm text-gray-800"></div> | |
| <div class="flex flex-col md:flex-row gap-4 mt-4"> | |
| <button id="downloadStatsJson" | |
| class="mt-4 w-full bg-purple-600 text-white py-2 rounded-lg hover:bg-purple-700 transition"> | |
| Download Statistics as JSON | |
| </button> | |
| <button id="downloadStatsCsv" | |
| class="mt-4 w-full bg-purple-600 text-white py-2 rounded-lg hover:bg-purple-700 transition"> | |
| Download Statistics as CSV | |
| </button> | |
| </div> | |
| </div> | |
| </section> | |
| <footer class="mt-12 text-center text-gray-600 text-sm"> | |
| <p>Developed by | |
| <a href="https://www.linkedin.com/in/fabian-h%C3%BCni/" class="text-blue-500 hover:underline">Fabian Hüni</a>, | |
| <a href="https://www.linkedin.com/in/philip-kehl/" class="text-blue-500 hover:underline">Philip Kehl</a> and | |
| <a href="https://www.linkedin.com/in/nicolas-wyss-6b172428b/" class="text-blue-500 hover:underline">Nicolas Wyss</a>. | |
| </p> | |
| </footer> | |
| </main> | |
| <script type="module" src="./src/main.js"></script> | |
| </body> | |
| </html> | |