File size: 9,695 Bytes
1b83e76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
## Goal

Rebuild the Gradio `dashboard.py` as a browser-based inspector for the Explainer OpenEnv at `https://kgdrathan-explainer-env.hf.space`. No Python, no Gradio, no `config.yaml`, no environment/provider/model dropdowns. The dashboard runs episodes (reset β†’ explore β†’ generate β†’ repair β†’ done), shows everything, and supports single-step and auto-run.

## Runtime config (server-side only, no UI selectors)

Environment variables (read inside server functions, never in the browser bundle):

- `ENV_BASE_URL` β€” default `https://kgdrathan-explainer-env.hf.space`
- `API_BASE_URL` β€” default `https://router.huggingface.co/v1`
- `HF_TOKEN` β€” required, stored as a Lovable Cloud secret
- `MODEL_NAME` β€” default `Qwen/Qwen2.5-72B-Instruct`

The dashboard shows `ENV_BASE_URL` and `MODEL_NAME` as a small read-only metadata strip (not editable).

## Architecture

```text
Browser (React inspector)
        β”‚  useServerFn
        β–Ό
TanStack server functions
  β”œβ”€ envReset({ seed?, episode_id? })
  β”œβ”€ envStep({ action })          ── proxies POST /step on ENV_BASE_URL
  β”œβ”€ envSchema()                  ── GET /schema (cached)
  └─ llmCall({ phase, obs, prior })── builds prompt, calls API_BASE_URL with HF_TOKEN,
                                       returns { raw, parsed action }
```

All env and LLM HTTP traffic goes through server functions. The browser never sees `HF_TOKEN`. CORS is irrelevant because calls are same-origin RPC.

## Explainer env contract (verified from `/schema`)

- POST `/reset` β†’ `{ observation, done }` where `observation` is an `ExplainerObservation` (topic, content, tier, keywords, phase, feedback, search_results, top_chunks, explored_context, explore_steps_left, repair_attempts_left, last_errors, available_tools, metadata, reward).
- POST `/step` body: `{ action: ExplainerAction }`. Action shape:
  - `action_type`: `"explore" | "generate" | "repair"`
  - explore: `tool` (one of search_wikipedia / search_hf_papers / search_arxiv / search_scholar / fetch_docs / search_hf_hub), `query`, `intent`
  - generate / repair: `format` (`"marimo" | "manim"`), `code`, `narration`, `repair_notes` (repair only)
- GET `/metadata`, GET `/schema` for header info and tool list.

The episode is "done" when `observation.done === true` or `phase === "done"`.

## LLM logic (port of `inference.py` to TypeScript)

Reimplement as pure TS in `src/server/llm/`:

- `buildExplorePrompt(obs, accumulatedContext)`
- `buildGeneratePrompt(obs, accumulatedContext)`
- `buildRepairPrompt(obs, lastCode, lastErrors)`
- `parseExploreResponse(text)` β†’ `{ tool, query, intent }` or `"SKIP"`
- `parseGenerateResponse(text)` β†’ `{ format, code, narration }`
- `callLLM(messages)` β†’ OpenAI-compatible `POST {API_BASE_URL}/chat/completions` with `Authorization: Bearer ${HF_TOKEN}` and `model: MODEL_NAME`.

Phase routing inside `runStep`:

- `phase === "explore"` β†’ explore prompt; on `SKIP`, force a generate step instead.
- `phase === "generate"` β†’ generate prompt; if env returns `phase === "repair"`, surface errors.
- `phase === "repair"` β†’ repair prompt seeded with `last_errors` + previous code.

## Episode state (single React store, e.g. Zustand)

```text
{
  sessionId, episodeId,
  envUrl, modelName,
  obs,                   // latest ExplainerObservation
  phase, step, done, score, status,
  task: { topic, tier, difficulty, keywords, content, dataAvailable },
  research: { exploredContext, topChunks: [], lastSearchResults },
  generation: { lastFormat, lastCode, lastNarration, generatedRaw, parsedAction },
  rewards: [],           // per-step total
  rewardDetails: [],     // per-step component breakdown
  log: [],               // [START]/[LLM]/[STEP]/[END]/[WARN] entries
  autoRunning: false
}
```

A "session" reset just generates a new `episodeId` and calls `/reset` β€” there is no long-lived server-side handle, so each `envStep` call is stateless toward the env (the env tracks state internally per episode_id).

## Task bank

Port `ALL_TASKS` from the Python `task_bank` to a TS constant `TASKS = [{ topic, difficulty, tier }, ...]`. Dropdown shows `(random)` plus `topic [difficulty, tier]`. Picking a task passes `topic` to `/reset` via the action body if the env supports it; otherwise it's stored as a target hint and the env's own random topic is used (we'll detect from the schema at build time).

## UI (single page at `/`)

Inspector layout, dark theme, monospace accents β€” not a marketing page.

```text
β”Œβ”€ Header ─────────────────────────────────────────────────────────┐
β”‚ Topic Β· Tier/Difficulty Β· Phase badge Β· Step n Β· Score Β· Status  β”‚
β”‚ env=ENV_BASE_URL  model=MODEL_NAME                               β”‚
β”œβ”€ Controls ────────────────────────────────────────────────────────
β”‚ [Task β–Ό (random)…] [Reset Episode] [Next Step] [Auto Run β–Ά/β– ]  β”‚
β”œβ”€ Left column ─────────────────┬─ Right column ───────────────────
β”‚ Observation                    β”‚ LLM panel                       β”‚
β”‚  β€’ topic / content (collapsed) β”‚  β€’ raw response                 β”‚
β”‚  β€’ keywords, data_available    β”‚  β€’ parsed action (JSON)         β”‚
β”‚  β€’ feedback (latest)           β”‚  β€’ generated code (syntax hl.)  β”‚
β”‚                                β”‚                                 β”‚
β”‚ Research                       β”‚ Rewards                         β”‚
β”‚  β€’ last search_results         β”‚  β€’ per-step total summary       β”‚
β”‚  β€’ Top 5 chunks table          β”‚  β€’ component breakdown table    β”‚
β”‚    (rank, source, title,       β”‚                                 β”‚
β”‚     score, url, snippet)       β”‚                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Timeline / log (scrollable, color-coded by tag)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

Shadcn primitives only: Card, Table, Badge, Button, Select, ScrollArea, Tabs (for code/raw/parsed), Separator. Code blocks use a lightweight highlighter.

## Behavior

- **Reset Episode**: clears state, calls `envReset`, populates task fields from observation, logs `[START]`.
- **Next Step**: reads `phase`, calls `llmCall` for that phase, logs `[LLM]` with raw + parsed, calls `envStep`, merges observation, appends reward + components, logs `[STEP]`. Disabled when `done`.
- **Auto Run**: loops Next Step with a small delay until `done`, repair attempts exhausted, or user hits Stop. Logs `[END]` with success / score / rewards.
- Errors from the env or LLM go into the log as `[WARN]` / `[ERROR]` and surface as a toast; the run halts but state is preserved.

## Reward handling

Port the Python helpers:

- `rewardComponents(obsMetadata, feedback)` β†’ filtered numeric components (uses the explore/generate/repair allow-lists).
- `parseRewardComponentsFromFeedback(feedback)` as a fallback for old observations.
- Total per phase: `explore_total | generate_total | repair_total` if present, else sum of visible components.
- Final episode score: `normalized_episode_score(rewards)` ported as a simple TS function (mean of generate+repair totals, clamped to [0,1]); `success = score >= SUCCESS_SCORE_THRESHOLD` (constant ported from `explainer_env/constants.py`, default 0.6 β€” confirmed during implementation).

## Technical details

- **Files added**:
  - `src/routes/index.tsx` β€” dashboard page, replaces placeholder.
  - `src/server/env.functions.ts` β€” `envReset`, `envStep`, `envMetadata`, `envSchema` server fns calling `${process.env.ENV_BASE_URL}`.
  - `src/server/llm/prompts.ts`, `src/server/llm/parse.ts`, `src/server/llm/client.ts` β€” port of inference.py prompt/parse/call.
  - `src/server/llm.functions.ts` β€” `runLlmStep({ phase, obs, prior })` server fn.
  - `src/server/config.functions.ts` β€” `getRuntimeConfig()` returning `{ envUrl, modelName }` (no secrets).
  - `src/lib/tasks.ts` β€” ported task bank.
  - `src/lib/rewards.ts` β€” reward parsing/normalization.
  - `src/lib/types.ts` β€” `ExplainerObservation`, `ExplainerAction`, etc.
  - `src/store/episode.ts` β€” Zustand store.
  - `src/components/inspector/*` β€” Header, Controls, ObservationPanel, LlmPanel, ResearchPanel, RewardsPanel, Log.
- **Secret**: `HF_TOKEN` added via Lovable Cloud secrets after plan approval. The user will be prompted to paste it.
- **Stack stays** TanStack Start + React + Tailwind + shadcn. No new heavy deps; add only `zustand` and a small syntax highlighter (`shiki` or `highlight.js`) β€” pick the smaller at implementation time.
- **Out of scope**: persistence across reloads, multi-user sessions, auth, charts. Pure single-tab inspector.

## Open items resolved during implementation

- Confirm whether `/reset` accepts a `topic` argument; if not, we still show the task picker but treat it as a display filter and surface a warning when the returned topic differs.
- Confirm exact `SUCCESS_SCORE_THRESHOLD` and `normalized_episode_score` formula from `explainer_env/constants.py` (you can paste it, otherwise default to mean-of-totals β‰₯ 0.6).