Spaces:

NishithP2004
/

aegis-env

Sleeping

App Files Files Community

aegis-env / server /web /benchmark.html

NishithP2004

Upload folder using huggingface_hub

fa01cfa verified 3 months ago

Raw

History Blame Contribute Delete

9.67 kB

	<!doctype html>
	<html lang="en">
	<head>
	<meta charset="UTF-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0" />
	<title>AEGIS-Env — Model benchmark</title>

	<link rel="preconnect" href="https://fonts.googleapis.com" />
	<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
	<link
	href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap"
	rel="stylesheet"
	/>

	<script src="https://cdn.tailwindcss.com"></script>
	<script>
	tailwind.config = {
	theme: {
	extend: {
	fontFamily: {
	sans: ["Inter", "ui-sans-serif", "system-ui", "sans-serif"],
	},
	boxShadow: {
	glow: "0 20px 60px rgba(99, 102, 241, 0.25)",
	},
	},
	},
	};
	</script>

	<style>
	.glass {
	background: rgba(255, 255, 255, 0.72);
	backdrop-filter: blur(14px);
	-webkit-backdrop-filter: blur(14px);
	border: 1px solid rgba(255, 255, 255, 0.6);
	}
	.soft-grid {
	background-image: radial-gradient(
	rgba(99, 102, 241, 0.12) 1px,
	transparent 1px
	),
	radial-gradient(rgba(236, 72, 153, 0.08) 1px, transparent 1px);
	background-position: 0 0, 12px 12px;
	background-size: 24px 24px;
	}
	</style>
	</head>

	<body class="min-h-screen bg-slate-50 text-slate-900 soft-grid">
	<div
	class="pointer-events-none fixed inset-x-0 top-0 h-80 bg-gradient-to-b from-indigo-200/60 via-fuchsia-200/30 to-transparent"
	></div>

	<div class="relative mx-auto max-w-7xl px-4 pb-12 pt-8 sm:px-6 lg:px-8">
	<header class="flex flex-col gap-4 sm:flex-row sm:items-end sm:justify-between">
	<div>
	<p class="text-sm font-medium text-slate-600">
	<a href="/web" class="text-indigo-700 hover:underline">← Playground</a>
	</p>
	<h1 class="mt-2 text-3xl font-semibold tracking-tight sm:text-4xl">
	<span
	class="text-transparent bg-clip-text bg-gradient-to-r from-indigo-600 via-fuchsia-600 to-sky-600"
	>
	Model benchmark
	</span>
	</h1>
	<p class="mt-2 max-w-2xl text-sm leading-6 text-slate-600">
	List models from an OpenAI-compatible endpoint (e.g.
	<span class="font-mono">GET …/v1/models</span>), choose five models and a task
	difficulty, then compare runs. Only the chat
	<span class="font-semibold">model</span> name changes between episodes; prompts and
	environment settings are identical.
	</p>
	</div>
	</header>

	<div id="error-banner" class="mt-6 hidden">
	<div class="glass rounded-3xl border border-rose-200 bg-rose-50/70 px-4 py-3 text-sm text-rose-800 shadow-sm">
	<div class="flex items-start justify-between gap-3">
	<pre id="error-text" class="whitespace-pre-wrap text-xs leading-5"></pre>
	<button id="error-dismiss" class="rounded-xl px-2 py-1 text-xs font-semibold text-rose-700 hover:bg-rose-100">
	Dismiss
	</button>
	</div>
	</div>
	</div>

	<section class="mt-8 glass rounded-3xl p-5 shadow-sm">
	<h2 class="text-sm font-semibold text-slate-800">Configuration</h2>
	<p class="mt-1 text-xs leading-5 text-slate-600">
	Default API root matches Ollama’s OpenAI-compatible surface (
	<a class="text-indigo-700 underline" href="https://ollama.com/v1/models" target="_blank" rel="noreferrer"
	>ollama.com/v1/models</a
	>). For a local daemon use <span class="font-mono">http://127.0.0.1:11434/v1</span>.
	</p>

	<div class="mt-4 grid gap-4 lg:grid-cols-2">
	<div>
	<label class="text-xs font-semibold text-slate-700">API root (list + chat)</label>
	<input
	id="api-root"
	type="text"
	value="https://ollama.com/v1"
	class="mt-1 w-full rounded-2xl border border-slate-200 bg-white/80 px-3 py-2.5 text-sm font-mono shadow-sm outline-none focus:border-indigo-300 focus:ring-4 focus:ring-indigo-200/60"
	/>
	<button
	id="btn-refresh-models"
	type="button"
	class="mt-2 inline-flex items-center gap-2 rounded-2xl border border-slate-200 bg-white/80 px-4 py-2 text-xs font-semibold text-slate-800 shadow-sm hover:bg-white"
	>
	List models
	</button>
	<p id="models-status" class="mt-2 text-xs text-slate-500"></p>
	</div>
	<div>
	<label class="text-xs font-semibold text-slate-700">Optional API key</label>
	<input
	id="api-key"
	type="password"
	autocomplete="off"
	placeholder="Leave empty to use server env or “ollama”"
	class="mt-1 w-full rounded-2xl border border-slate-200 bg-white/80 px-3 py-2.5 text-sm shadow-sm outline-none focus:border-indigo-300 focus:ring-4 focus:ring-indigo-200/60"
	/>
	</div>
	</div>

	<div class="mt-6">
	<div class="text-xs font-semibold text-slate-700">Select five models</div>
	<div id="model-slots" class="mt-2 grid gap-2 sm:grid-cols-2 lg:grid-cols-5"></div>
	</div>

	<div class="mt-6 flex flex-wrap items-end gap-4">
	<div>
	<label class="text-xs font-semibold text-slate-700">Task difficulty</label>
	<select
	id="bench-task"
	class="mt-1 block rounded-2xl border border-slate-200 bg-white/80 px-3 py-2.5 text-sm shadow-sm outline-none focus:border-indigo-300 focus:ring-4 focus:ring-indigo-200/60"
	>
	<option value="easy">Easy</option>
	<option value="medium">Medium</option>
	<option value="hard">Hard</option>
	</select>
	</div>
	<div>
	<label class="text-xs font-semibold text-slate-700">Max steps</label>
	<input
	id="bench-max-steps"
	type="number"
	min="1"
	max="200"
	value="10"
	class="mt-1 w-24 rounded-2xl border border-slate-200 bg-white/80 px-3 py-2.5 text-sm shadow-sm outline-none focus:border-indigo-300 focus:ring-4 focus:ring-indigo-200/60"
	/>
	</div>
	<div>
	<label class="text-xs font-semibold text-slate-700">Seed (optional)</label>
	<input
	id="bench-seed"
	type="number"
	min="0"
	placeholder="random"
	class="mt-1 w-28 rounded-2xl border border-slate-200 bg-white/80 px-3 py-2.5 text-sm shadow-sm outline-none focus:border-indigo-300 focus:ring-4 focus:ring-indigo-200/60"
	/>
	</div>
	<button
	id="btn-run-benchmark"
	type="button"
	class="inline-flex items-center gap-2 rounded-2xl bg-slate-900 px-5 py-2.5 text-sm font-semibold text-white shadow-sm transition hover:bg-slate-800 disabled:opacity-50"
	>
	<span class="h-2 w-2 rounded-full bg-emerald-400"></span>
	Run benchmark
	</button>
	</div>
	<p id="bench-status" class="mt-3 text-xs font-medium text-indigo-700"></p>
	</section>

	<section class="mt-8 glass rounded-3xl p-5 shadow-sm">
	<h2 class="text-sm font-semibold text-slate-800">Results</h2>
	<div class="mt-3 overflow-x-auto">
	<table class="w-full min-w-[32rem] text-left text-xs">
	<thead>
	<tr class="border-b border-slate-200 text-slate-500">
	<th class="py-2 pr-3 font-semibold">Model</th>
	<th class="py-2 pr-3 font-semibold">Total reward</th>
	<th class="py-2 pr-3 font-semibold">Steps</th>
	<th class="py-2 font-semibold">Error</th>
	</tr>
	</thead>
	<tbody id="bench-table-body"></tbody>
	</table>
	</div>
	</section>

	<section class="mt-8 grid gap-6 lg:grid-cols-2">
	<div class="glass rounded-3xl p-5 shadow-sm">
	<h3 class="text-sm font-semibold text-slate-800">Total reward by model</h3>
	<div class="mt-4 h-72">
	<canvas id="chart-total" aria-label="Total reward"></canvas>
	</div>
	</div>
	<div class="glass rounded-3xl p-5 shadow-sm">
	<h3 class="text-sm font-semibold text-slate-800">Steps to last transition</h3>
	<div class="mt-4 h-72">
	<canvas id="chart-steps" aria-label="Step count"></canvas>
	</div>
	</div>
	</section>

	<section class="mt-8 glass rounded-3xl p-5 shadow-sm">
	<h3 class="text-sm font-semibold text-slate-800">Cumulative reward over steps</h3>
	<p class="mt-1 text-xs text-slate-600">Per-episode reward sequence (same task + seed per model).</p>
	<div class="mt-4 h-96">
	<canvas id="chart-cumulative" aria-label="Cumulative reward"></canvas>
	</div>
	</section>

	<footer class="mt-10 text-center text-xs text-slate-500">
	Benchmark uses <span class="font-mono">POST /api/benchmark/run</span> on this server (same prompts as
	<span class="font-mono">inference.py</span>).
	</footer>
	</div>

	<script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.1/dist/chart.umd.min.js"></script>
	<script src="/web/assets/benchmark.js"></script>
	</body>
	</html>