Gnan Deep Rathan K
Deploy dashboard without binary lockfile
1b83e76
## Goal
Rebuild the Gradio `dashboard.py` as a browser-based inspector for the Explainer OpenEnv at `https://kgdrathan-explainer-env.hf.space`. No Python, no Gradio, no `config.yaml`, no environment/provider/model dropdowns. The dashboard runs episodes (reset β†’ explore β†’ generate β†’ repair β†’ done), shows everything, and supports single-step and auto-run.
## Runtime config (server-side only, no UI selectors)
Environment variables (read inside server functions, never in the browser bundle):
- `ENV_BASE_URL` β€” default `https://kgdrathan-explainer-env.hf.space`
- `API_BASE_URL` β€” default `https://router.huggingface.co/v1`
- `HF_TOKEN` β€” required, stored as a Lovable Cloud secret
- `MODEL_NAME` β€” default `Qwen/Qwen2.5-72B-Instruct`
The dashboard shows `ENV_BASE_URL` and `MODEL_NAME` as a small read-only metadata strip (not editable).
## Architecture
```text
Browser (React inspector)
β”‚ useServerFn
β–Ό
TanStack server functions
β”œβ”€ envReset({ seed?, episode_id? })
β”œβ”€ envStep({ action }) ── proxies POST /step on ENV_BASE_URL
β”œβ”€ envSchema() ── GET /schema (cached)
└─ llmCall({ phase, obs, prior })── builds prompt, calls API_BASE_URL with HF_TOKEN,
returns { raw, parsed action }
```
All env and LLM HTTP traffic goes through server functions. The browser never sees `HF_TOKEN`. CORS is irrelevant because calls are same-origin RPC.
## Explainer env contract (verified from `/schema`)
- POST `/reset` β†’ `{ observation, done }` where `observation` is an `ExplainerObservation` (topic, content, tier, keywords, phase, feedback, search_results, top_chunks, explored_context, explore_steps_left, repair_attempts_left, last_errors, available_tools, metadata, reward).
- POST `/step` body: `{ action: ExplainerAction }`. Action shape:
- `action_type`: `"explore" | "generate" | "repair"`
- explore: `tool` (one of search_wikipedia / search_hf_papers / search_arxiv / search_scholar / fetch_docs / search_hf_hub), `query`, `intent`
- generate / repair: `format` (`"marimo" | "manim"`), `code`, `narration`, `repair_notes` (repair only)
- GET `/metadata`, GET `/schema` for header info and tool list.
The episode is "done" when `observation.done === true` or `phase === "done"`.
## LLM logic (port of `inference.py` to TypeScript)
Reimplement as pure TS in `src/server/llm/`:
- `buildExplorePrompt(obs, accumulatedContext)`
- `buildGeneratePrompt(obs, accumulatedContext)`
- `buildRepairPrompt(obs, lastCode, lastErrors)`
- `parseExploreResponse(text)` β†’ `{ tool, query, intent }` or `"SKIP"`
- `parseGenerateResponse(text)` β†’ `{ format, code, narration }`
- `callLLM(messages)` β†’ OpenAI-compatible `POST {API_BASE_URL}/chat/completions` with `Authorization: Bearer ${HF_TOKEN}` and `model: MODEL_NAME`.
Phase routing inside `runStep`:
- `phase === "explore"` β†’ explore prompt; on `SKIP`, force a generate step instead.
- `phase === "generate"` β†’ generate prompt; if env returns `phase === "repair"`, surface errors.
- `phase === "repair"` β†’ repair prompt seeded with `last_errors` + previous code.
## Episode state (single React store, e.g. Zustand)
```text
{
sessionId, episodeId,
envUrl, modelName,
obs, // latest ExplainerObservation
phase, step, done, score, status,
task: { topic, tier, difficulty, keywords, content, dataAvailable },
research: { exploredContext, topChunks: [], lastSearchResults },
generation: { lastFormat, lastCode, lastNarration, generatedRaw, parsedAction },
rewards: [], // per-step total
rewardDetails: [], // per-step component breakdown
log: [], // [START]/[LLM]/[STEP]/[END]/[WARN] entries
autoRunning: false
}
```
A "session" reset just generates a new `episodeId` and calls `/reset` β€” there is no long-lived server-side handle, so each `envStep` call is stateless toward the env (the env tracks state internally per episode_id).
## Task bank
Port `ALL_TASKS` from the Python `task_bank` to a TS constant `TASKS = [{ topic, difficulty, tier }, ...]`. Dropdown shows `(random)` plus `topic [difficulty, tier]`. Picking a task passes `topic` to `/reset` via the action body if the env supports it; otherwise it's stored as a target hint and the env's own random topic is used (we'll detect from the schema at build time).
## UI (single page at `/`)
Inspector layout, dark theme, monospace accents β€” not a marketing page.
```text
β”Œβ”€ Header ─────────────────────────────────────────────────────────┐
β”‚ Topic Β· Tier/Difficulty Β· Phase badge Β· Step n Β· Score Β· Status β”‚
β”‚ env=ENV_BASE_URL model=MODEL_NAME β”‚
β”œβ”€ Controls ────────────────────────────────────────────────────────
β”‚ [Task β–Ό (random)…] [Reset Episode] [Next Step] [Auto Run β–Ά/β– ] β”‚
β”œβ”€ Left column ─────────────────┬─ Right column ───────────────────
β”‚ Observation β”‚ LLM panel β”‚
β”‚ β€’ topic / content (collapsed) β”‚ β€’ raw response β”‚
β”‚ β€’ keywords, data_available β”‚ β€’ parsed action (JSON) β”‚
β”‚ β€’ feedback (latest) β”‚ β€’ generated code (syntax hl.) β”‚
β”‚ β”‚ β”‚
β”‚ Research β”‚ Rewards β”‚
β”‚ β€’ last search_results β”‚ β€’ per-step total summary β”‚
β”‚ β€’ Top 5 chunks table β”‚ β€’ component breakdown table β”‚
β”‚ (rank, source, title, β”‚ β”‚
β”‚ score, url, snippet) β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Timeline / log (scrollable, color-coded by tag) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
Shadcn primitives only: Card, Table, Badge, Button, Select, ScrollArea, Tabs (for code/raw/parsed), Separator. Code blocks use a lightweight highlighter.
## Behavior
- **Reset Episode**: clears state, calls `envReset`, populates task fields from observation, logs `[START]`.
- **Next Step**: reads `phase`, calls `llmCall` for that phase, logs `[LLM]` with raw + parsed, calls `envStep`, merges observation, appends reward + components, logs `[STEP]`. Disabled when `done`.
- **Auto Run**: loops Next Step with a small delay until `done`, repair attempts exhausted, or user hits Stop. Logs `[END]` with success / score / rewards.
- Errors from the env or LLM go into the log as `[WARN]` / `[ERROR]` and surface as a toast; the run halts but state is preserved.
## Reward handling
Port the Python helpers:
- `rewardComponents(obsMetadata, feedback)` β†’ filtered numeric components (uses the explore/generate/repair allow-lists).
- `parseRewardComponentsFromFeedback(feedback)` as a fallback for old observations.
- Total per phase: `explore_total | generate_total | repair_total` if present, else sum of visible components.
- Final episode score: `normalized_episode_score(rewards)` ported as a simple TS function (mean of generate+repair totals, clamped to [0,1]); `success = score >= SUCCESS_SCORE_THRESHOLD` (constant ported from `explainer_env/constants.py`, default 0.6 β€” confirmed during implementation).
## Technical details
- **Files added**:
- `src/routes/index.tsx` β€” dashboard page, replaces placeholder.
- `src/server/env.functions.ts` β€” `envReset`, `envStep`, `envMetadata`, `envSchema` server fns calling `${process.env.ENV_BASE_URL}`.
- `src/server/llm/prompts.ts`, `src/server/llm/parse.ts`, `src/server/llm/client.ts` β€” port of inference.py prompt/parse/call.
- `src/server/llm.functions.ts` β€” `runLlmStep({ phase, obs, prior })` server fn.
- `src/server/config.functions.ts` β€” `getRuntimeConfig()` returning `{ envUrl, modelName }` (no secrets).
- `src/lib/tasks.ts` β€” ported task bank.
- `src/lib/rewards.ts` β€” reward parsing/normalization.
- `src/lib/types.ts` β€” `ExplainerObservation`, `ExplainerAction`, etc.
- `src/store/episode.ts` β€” Zustand store.
- `src/components/inspector/*` β€” Header, Controls, ObservationPanel, LlmPanel, ResearchPanel, RewardsPanel, Log.
- **Secret**: `HF_TOKEN` added via Lovable Cloud secrets after plan approval. The user will be prompted to paste it.
- **Stack stays** TanStack Start + React + Tailwind + shadcn. No new heavy deps; add only `zustand` and a small syntax highlighter (`shiki` or `highlight.js`) β€” pick the smaller at implementation time.
- **Out of scope**: persistence across reloads, multi-user sessions, auth, charts. Pure single-tab inspector.
## Open items resolved during implementation
- Confirm whether `/reset` accepts a `topic` argument; if not, we still show the task picker but treat it as a display filter and surface a warning when the returned topic differs.
- Confirm exact `SUCCESS_SCORE_THRESHOLD` and `normalized_episode_score` formula from `explainer_env/constants.py` (you can paste it, otherwise default to mean-of-totals β‰₯ 0.6).