Spaces:

seanpoyner
/

smolcode

Paused

App Files Files Community

seanpoyner commited on 19 days ago

Commit

daea45b

verified ·

1 Parent(s): 6cdce0d

Upload folder using huggingface_hub

Browse files

Files changed (35) hide show

.gitattributes +2 -0
Dockerfile +25 -0
README.md +90 -4
app.py +849 -0
demo.mp4 +3 -0
engine/__init__.py +22 -0
engine/agent.py +197 -0
engine/branding.py +172 -0
engine/browser_runner.py +145 -0
engine/browsercheck.py +111 -0
engine/builder.py +270 -0
engine/config.py +290 -0
engine/fanout.py +128 -0
engine/file_tree.py +92 -0
engine/gradio_shell.py +425 -0
engine/judge.py +90 -0
engine/live_run.py +93 -0
engine/playwright_runner.py +132 -0
engine/preflight.py +116 -0
engine/preview.py +161 -0
engine/route_clf.py +243 -0
engine/router.py +455 -0
engine/rust_session.py +425 -0
engine/sandbox.py +141 -0
engine/themes.py +60 -0
engine/tools.py +174 -0
engine/trace.py +73 -0
engine/trace_collector.py +128 -0
engine/ui_trace.py +121 -0
engine/web_tui.py +471 -0
engine/webcheck.js +108 -0
engine/webcheck.py +65 -0
requirements.txt +2 -0
smolcode_core-0.1.0-cp312-cp312-manylinux_2_39_x86_64.whl +3 -0
static/web_tui.js +380 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+demo.mp4 filter=lfs diff=lfs merge=lfs -text
+smolcode_core-0.1.0-cp312-cp312-manylinux_2_39_x86_64.whl filter=lfs diff=lfs merge=lfs -text

Dockerfile ADDED Viewed

	@@ -0,0 +1,25 @@

+FROM ubuntu:24.04
+ENV DEBIAN_FRONTEND=noninteractive PYTHONUNBUFFERED=1
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        python3 python3-pip python3-venv ca-certificates && \
+    rm -rf /var/lib/apt/lists/*
+RUN python3 -m venv /opt/venv
+ENV PATH="/opt/venv/bin:$PATH"
+WORKDIR /app
+COPY requirements.txt smolcode_core-0.1.0-cp312-cp312-manylinux_2_39_x86_64.whl ./
+RUN pip install --no-cache-dir -r requirements.txt \
+        ./smolcode_core-0.1.0-cp312-cp312-manylinux_2_39_x86_64.whl
+COPY app.py demo.mp4 ./
+COPY engine ./engine
+COPY static ./static
+# HF Docker Spaces run as uid 1000; let the agent write its workspace
+RUN mkdir -p /app/.workspace && chmod -R 777 /app
+ENV SMOLCODE_HOST=0.0.0.0 SMOLCODE_PORT=7860 HF_HOME=/tmp/hf
+# Backend: full specialist matrix served from HAL via the public tunnel. Baked in
+# (URL + "ollama" key are not secret) so it reaches the container reliably; swap
+# this URL + rebuild to point at a durable endpoint for judging.
+ENV SMALLCODE_PRESET=hal-matrix \
+    SMALLCODE_BASE_URL=https://collapse-snake-achieving-controversial.trycloudflare.com/v1 \
+    SMALLCODE_API_KEY=ollama
+EXPOSE 7860
+CMD ["python3", "app.py"]

README.md CHANGED Viewed

@@ -1,10 +1,96 @@
 ---
-title: Smolcode
-emoji: 👁
-colorFrom: green
 colorTo: indigo
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: smolcode
+emoji: 🤖
+colorFrom: purple
 colorTo: indigo
 sdk: docker
+app_port: 7860
 pinned: false
+license: apache-2.0
+short_description: A tiny local model that writes code, runs it, and fixes it.
+tags:
+  - build-small-hackathon
+  - agent
+  - code-generation
 ---
+# smolcode 🤖
+**A tiny local model that writes code, runs it, and fixes it — until it works.**
+smolcode is an *agentic* coding assistant built for **small** language models. Instead of
+autocompleting, it runs a **plan → write → execute → repair** loop: it writes a file, runs
+it in a sandbox, reads the real error, and iterates until a test passes — on a model small
+enough to run on your own machine (a ≤4B model on a laptop, scaling up to 32B on a
+workstation). **No cloud APIs.**
+Built for the [Hugging Face × Gradio **Build Small** Hackathon](https://huggingface.co/build-small-hackathon).
+## Why it's a "Build Small" entry
+- **Agentic on a 3B model.** The loop — not the model size — does the work. A ≤4B model
+  drives tool calls reliably enough to write, run, and self-correct code.
+- **Local-first & private.** Talks to any OpenAI-compatible endpoint (Ollama, llama.cpp).
+  Nothing leaves your machine.
+- **Specialty routing.** A 2D router classifies tasks into 16 language/function
+  families and escalates within each family's fine-tuned ladder before falling back
+  to bigger Granite models.
+- **Fine-tuned tiny coder.** We fine-tuned **Qwen2.5-Coder-1.5B** to emit native tool calls
+  so a ≤2B model can be the cheap entry tier — published at
+  [`seanpoyner/smolcode-coder-1.5b-tools`](https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools).
+- **Rust core.** Agent loop, tool execution, and tracing run through
+  [**LiteForge**](https://github.com/seanpoyner/liteforge) and **smolcode-core**
+  (Rust/PyO3). Gradio is the (required) shell; the brain is Rust.
+## How to use this Space
+1. Type a coding task, e.g. *"write a function that validates an email and test it."*
+2. Watch the **agent trace** stream live: `write_file → run_python → (error) → fix → pass`.
+3. The **router** badge shows which tier solved it and whether it's **✓ verified**.
+4. Tick **⚡ fan out** and enter several lines to run independent tasks as **parallel subagents**.
+## Benchmark — the loop is the product
+The agentic loop is what makes a tiny model useful. On the same HumanEval-style suite
+(`bench/tasks.py`, 10 tasks, pass@1):
+<!-- BENCH_TABLE_START -->
+| System | Model | pass@1 |
+|--------|-------|--------|
+| single-shot | fine-tuned **1.5B** | 50% |
+| **agentic loop** | fine-tuned **1.5B** | **70%** |
+| single-shot | granite4.1:3b | 90% |
+*The write→run→fix loop lifts the fine-tuned 1.5B from **50% → 70%** (+20 pts) — the
+loop, not raw model size, does the work. A larger model (granite 3B) scores higher
+single-shot, which is exactly why the router escalates only when the small tier can't
+verify. Measured with `bench/run.py` on the hal backend.*
+<!-- BENCH_TABLE_END -->
+## Under the hood
+```
+Gradio UI  →  smolcode-core / LiteForge (Rust/PyO3)  →  OpenAI-compatible endpoint
+                  specialty router + agent loop
+                  tools: write_file, read_file, run_python, run_tests
+                  served by Ollama / llama.cpp
+```
+There's also a full terminal agent (`smolcode-cli`, a Rust ratatui TUI) and a
+Replit/Lovable-style app builder (`smolbuilder.py`) on the same engine.
+- **Code:** https://github.com/seanpoyner/smolcode
+- **Model:** https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools
+- **Engine:** https://github.com/seanpoyner/liteforge
+- **App builder companion:** https://huggingface.co/spaces/seanpoyner/smolbuilder
+## Demo video
+<video controls src="https://huggingface.co/spaces/seanpoyner/smolcode/resolve/main/demo.mp4"></video>
+[▶️ Watch the demo](https://huggingface.co/spaces/seanpoyner/smolcode/resolve/main/demo.mp4) — the agent writes code, runs it, fixes the failing test, and shows the router tier that solved it.
+## Share
+> Most coding tasks don't need a giant model. **smolcode** is an agentic coding agent that runs entirely on a *small local model* — it writes the code, runs it, reads the real error, and fixes itself until tests pass. Fine-tuned **1.5B** coder; the router escalates a tier only when needed (all ≤32B). Less compute, same result.
+>
+> Built for the #BuildSmall hackathon with @huggingface + @Gradio. 🦀 Rust core.
+> ▶️ https://huggingface.co/spaces/seanpoyner/smolcode
+> #SmallModels #LocalAI #Gradio #BuildSmall
+📣 **Posted on LinkedIn:** https://www.linkedin.com/posts/sean-poyner_buildsmall-smallmodels-localai-share-7472421438109650944-bQGy/

app.py ADDED Viewed

	@@ -0,0 +1,849 @@

+"""smolcode — CLI-parity web UI over the Rust engine."""
+from __future__ import annotations
+import json
+import os
+from dataclasses import dataclass, field
+from pathlib import Path
+import gradio as gr
+from engine import Router, load_preset
+from engine.config import (
+    Preset,
+    Tier,
+    is_specialty_model,
+    parse_size_b,
+    specialist_sizes,
+)
+from engine.branding import SMOLCODE_CSS
+from engine.gradio_shell import (
+    AppSessionState,
+    SlashResult,
+    UiSettings,
+    dispatch_slash,
+    parse_input,
+)
+from engine.preflight import list_models
+from engine.router import RouteResult
+from engine.rust_session import (
+    RustSession,
+    apply_settings,
+    get_session_chat,
+    git_status,
+    list_background_jobs,
+    load_rust_config,
+    parse_session_label,
+    session_choices,
+    workspace_paths,
+    AUTOCOMPLETE_FILE_LIMIT,
+    UI_FILE_LIMIT,
+)
+from engine.trace import build_trace, save_trace
+from engine.themes import theme_at
+from engine.web_tui import (
+    Transcript,
+    agent_choices,
+    cycle_agent,
+    cycle_mode,
+    cycle_model,
+    cycle_think,
+    header_bar_html,
+    help_overlay_html,
+    host_from_url,
+    ingest_agent_event,
+    parse_git_header,
+    render_picker_html,
+    render_sidebar_html,
+    shell_theme_html,
+    slash_commands,
+    status_bar_html,
+    theme_picker_items,
+    whichkey_overlay_html,
+)
+PRESET = load_preset()
+_JS_HEAD = (Path(__file__).parent / "static" / "web_tui.js").read_text()
+@dataclass
+class WebUiState:
+    sidebar_visible: bool = True
+    sidebar_view: str = "files"
+    sidebar_sel: int = 0
+    theme_idx: int = 0
+    overlay: str = ""
+    picker_kind: str = ""
+    picker_items: list[str] = field(default_factory=list)
+    picker_sel: int = 0
+    file_total: int = 0
+    # Blocking startup model pick: true until the user chooses from the modal.
+    needs_model_pick: bool = True
+def _normalize_paths(files: list[str] | dict[str, str] | None) -> list[str]:
+    if not files:
+        return []
+    if isinstance(files, dict):
+        paths = sorted(files.keys())
+    else:
+        paths = sorted(files)
+    return paths[:UI_FILE_LIMIT]
+def _cfg() -> dict:
+    return load_rust_config()
+def _ensure_rust(app_state: AppSessionState, settings: UiSettings) -> RustSession:
+    if app_state.rust is None:
+        app_state.rust = RustSession(
+            workspace=settings.workspace,
+            agent=settings.agent,
+            yolo=settings.yolo,
+            model=_pinned_model(settings.model),  # None for Auto -> router sets it
+            base_url=_cfg().get("base_url"),
+            approval_handler=app_state.approval.ask,
+        )
+    apply_settings(app_state.rust, settings)
+    return app_state.rust
+# --- curated model picker (Auto-first, <=32B, specialty fine-tunes collapsed) -------
+# Each row is (label, model, think). model "auto"/"auto:<size>" are router pseudo-tags
+# interpreted by engine/router.py + rust_session.apply_settings; think "off" means the
+# router derives the level.
+_AUTO_ENTRIES: list[tuple[str, str, str]] = [
+    ("Auto", "auto", "off"),
+    ("Auto · think low", "auto", "low"),
+    ("Auto · think high", "auto", "high"),
+    ("Auto · think xtra", "auto", "xtra"),
+]
+def _model_entries() -> list[tuple[str, str, str]]:
+    """All picker rows: Auto options, one Auto·<size> per served specialist size, then
+    generic concrete models filtered to <=32B with the per-specialty fine-tunes hidden."""
+    entries = list(_AUTO_ENTRIES)
+    for sz in specialist_sizes(PRESET):
+        entries.append((f"Auto · {sz.upper()}", f"auto:{sz}", "off"))
+    seen: set[str] = set()
+    base = [t.model for t in PRESET.tiers if t.model]
+    api = list_models(_cfg().get("base_url", PRESET.base_url))
+    for m in api + base:
+        if not m or m in seen or is_specialty_model(m) or parse_size_b(m) > 32:
+            continue
+        seen.add(m)
+        entries.append((m, m, "off"))
+    return entries
+def _model_labels() -> list[str]:
+    return [lbl for lbl, _m, _t in _model_entries()]
+def _label_to_selection(label: str) -> tuple[str, str] | None:
+    """(model, think) for a picker label, or None if unknown."""
+    for lbl, m, t in _model_entries():
+        if lbl == label:
+            return m, t
+    return None
+def _model_sel_index(settings: UiSettings) -> int:
+    """Row index matching the current (model, think); falls back to 0 (Auto)."""
+    entries = _model_entries()
+    cur_m = settings.model or "auto"
+    cur_t = settings.think or "off"
+    for i, (_l, m, t) in enumerate(entries):  # exact (model, think) wins
+        if m == cur_m and t == cur_t:
+            return i
+    for i, (_l, m, _t) in enumerate(entries):  # else first model match
+        if m == cur_m:
+            return i
+    return 0
+def _selection_label(settings: UiSettings) -> str:
+    """Friendly label for the current selection (model chip in header/status)."""
+    entries = _model_entries()
+    return entries[_model_sel_index(settings)][0] if entries else "Auto"
+def _pinned_model(model_sel: str | None) -> str | None:
+    """The concrete model tag to pin, or None for Auto/Auto·size (router-driven)."""
+    m = model_sel or ""
+    return None if (not m or m == "auto" or m.startswith("auto:")) else m
+def _effective_preset(model_sel: str | None):
+    """(preset, size_floor) for a picker selection.
+    'auto' -> matrix preset (router picks size); 'auto:<size>' -> matrix + start pinned
+    to that size (still escalates); '<tag>' -> single-tier preset (pinned, no escalation).
+    """
+    sel = model_sel or "auto"
+    if sel == "auto":
+        return PRESET, None
+    if sel.startswith("auto:"):
+        return PRESET, (sel.split(":", 1)[1] or None)
+    return (
+        Preset(key=PRESET.key, base_url=PRESET.base_url, api_key=PRESET.api_key,
+               tiers=[Tier("custom", sel)]),
+        None,
+    )
+def _picker_items(kind: str, settings: UiSettings) -> list[str]:
+    if kind == "models":
+        return _model_labels()
+    if kind == "themes":
+        return theme_picker_items()
+    if kind == "agents":
+        return agent_choices()
+    if kind == "sessions":
+        return session_choices()
+    return []
+def _picker_sel_for(kind: str, settings: UiSettings, ui: WebUiState, items: list[str]) -> int:
+    if not items:
+        return 0
+    if kind == "models":
+        return _model_sel_index(settings)
+    if kind == "themes":
+        name = theme_at(ui.theme_idx).name
+        return items.index(name) if name in items else 0
+    if kind == "agents":
+        cur = settings.agent if settings.mode != "plan" else "plan"
+        return items.index(cur) if cur in items else 0
+    return 0
+def _header(settings: UiSettings, ui: WebUiState) -> str:
+    git = git_status(settings.workspace)
+    branch, dirty = parse_git_header(git)
+    return header_bar_html(
+        git_branch=branch,
+        git_dirty=dirty,
+        model=_selection_label(settings),
+        host=host_from_url(_cfg().get("base_url", "")),
+        theme=theme_at(ui.theme_idx).name,
+    )
+def _status(settings: UiSettings, app_state: AppSessionState, *, running: bool = False) -> str:
+    title = f"session {app_state.rust.session_id[:8]}" if app_state.rust else "new session"
+    return status_bar_html(
+        settings, session_title=title,
+        model=_selection_label(settings),
+        running=running,
+    )
+def _sidebar_html(ui: WebUiState, settings: UiSettings, files: list[str], app_state: AppSessionState) -> str:
+    sid = app_state.rust.session_id if app_state.rust else "(none)"
+    return render_sidebar_html(
+        view=ui.sidebar_view,
+        files=files,
+        selected=ui.sidebar_sel,
+        session_id=sid,
+        agent=settings.agent,
+        file_total=ui.file_total or len(files),
+    )
+def _overlay_html(ui: WebUiState) -> str:
+    if ui.overlay == "help":
+        return f'<div class="sc-overlay"><div class="sc-overlay-panel">{help_overlay_html()}</div></div>'
+    if ui.overlay == "whichkey":
+        return f'<div class="sc-overlay"><div class="sc-overlay-panel">{whichkey_overlay_html()}</div></div>'
+    if ui.overlay == "picker" and ui.picker_kind:
+        panel = render_picker_html(
+            ui.picker_kind,
+            ui.picker_items,
+            ui.picker_sel,
+            title=ui.picker_kind,
+        )
+        return f'<div class="sc-overlay"><div class="sc-overlay-panel">{panel}</div></div>'
+    return ""
+def _js_boot_lines(settings: UiSettings, files: list[str]) -> str:
+    cmds = slash_commands(settings.workspace)
+    paths = sorted(files)[:AUTOCOMPLETE_FILE_LIMIT]
+    return (
+        f"window.__smolcode_workspace={json.dumps(settings.workspace)};"
+        f"window.__smolcode_commands={json.dumps(cmds)};"
+        f"window.__smolcode_files={json.dumps(paths)};"
+    )
+def _embed_js(settings: UiSettings, files: list[str]) -> str:
+    return f"<script>{_js_boot_lines(settings, files)}</script>"
+def _outputs(
+    transcript: Transcript,
+    app_state: AppSessionState,
+    settings: UiSettings,
+    ui: WebUiState,
+    files: list[str],
+    *,
+    running: bool = False,
+    trace_path: str | None = None,
+):
+    overlay_val = _overlay_html(ui)
+    return (
+        transcript.render_html(running=running),
+        _header(settings, ui),
+        _status(settings, app_state, running=running),
+        gr.update(value=_sidebar_html(ui, settings, files, app_state), visible=ui.sidebar_visible),
+        gr.update(value=overlay_val, visible=bool(overlay_val)),
+        shell_theme_html(ui.theme_idx),
+        gr.update(visible=bool(app_state.approval.pending_desc)),
+        app_state.approval.pending_desc or "",
+        files,
+        trace_path,
+        app_state,
+        settings,
+        ui,
+        transcript,
+        "",  # clear editor
+    )
+def _apply_slash_ui(sr: SlashResult, settings: UiSettings, ui: WebUiState, transcript: Transcript):
+    if sr.cycle_mode:
+        settings.mode = cycle_mode(settings.mode)
+        transcript.append_info(f"mode → {settings.mode}")
+    if sr.cycle_think:
+        settings.think = cycle_think(settings.think)
+        transcript.append_info(f"think → {settings.think}")
+    if sr.set_think:
+        settings.think = sr.set_think
+        transcript.append_info(f"think → {settings.think}")
+    if sr.toggle_sidebar:
+        ui.sidebar_visible = not ui.sidebar_visible
+    if sr.toggle_sidebar_view:
+        ui.sidebar_view = "stats" if ui.sidebar_view == "files" else "files"
+    if sr.show_help:
+        ui.overlay = "help"
+    if sr.show_whichkey:
+        ui.overlay = "whichkey"
+    if sr.open_picker:
+        ui.overlay = "picker"
+        ui.picker_kind = sr.open_picker
+        ui.picker_items = _picker_items(sr.open_picker, settings)
+        ui.picker_sel = _picker_sel_for(sr.open_picker, settings, ui, ui.picker_items)
+        transcript.append_info(f"picker → {sr.open_picker}")
+async def _run_agent_turn(
+    task: str,
+    transcript: Transcript,
+    app_state: AppSessionState,
+    settings: UiSettings,
+    ui: WebUiState,
+    files: list[str],
+):
+    # Blocking model pick: refuse to run until the user has chosen from the modal.
+    if ui.needs_model_pick:
+        ui.overlay = "picker"
+        ui.picker_kind = "models"
+        ui.picker_items = _model_labels()
+        ui.picker_sel = _model_sel_index(settings)
+        transcript.append_info("pick a model to start — Auto is recommended")
+        yield _outputs(transcript, app_state, settings, ui, files)
+        return
+    rust = _ensure_rust(app_state, settings)
+    rust.clear_cancel()
+    preset, size_floor = _effective_preset(settings.model)
+    router = Router(
+        preset=preset,
+        approval_handler=app_state.approval.ask,
+        workspace_dir=settings.workspace,
+        think=settings.think,
+        yolo=settings.yolo,
+        agent=settings.agent,
+        size_floor=size_floor,
+    )
+    ladder, start, _think = router._route(task)  # real routing for the badge
+    transcript.append_user(task)
+    transcript.append_info(f"routed to {ladder.tiers[start].name}")
+    ui.overlay = ""
+    yield _outputs(transcript, app_state, settings, ui, files, running=True)
+    result: RouteResult | None = None
+    async for frame in router.run_live(task, rust_session=rust):
+        if frame.raw_event:
+            ingest_agent_event(transcript, frame.raw_event)
+        if frame.files:
+            files = _normalize_paths(frame.files)
+        if frame.done and isinstance(frame.result, RouteResult):
+            result = frame.result
+            if rust.cancelled:
+                transcript.append_error("interrupted")
+        yield _outputs(transcript, app_state, settings, ui, files, running=not frame.done)
+    trace_path = None
+    if result and result.agent and not rust.cancelled:
+        app_state.bg_jobs = list_background_jobs()
+        rust.save()
+        try:
+            trace_path = str(save_trace(build_trace(
+                result.agent, task, result.final,
+                preset=PRESET.key, model=result.tier_model,
+            )))
+        except Exception:
+            pass
+    yield _outputs(transcript, app_state, settings, ui, files, trace_path=trace_path)
+async def respond(
+    message: str,
+    transcript: Transcript,
+    app_state: AppSessionState,
+    settings: UiSettings,
+    ui: WebUiState,
+    files: list[str],
+):
+    message = (message or "").strip()
+    app_state.settings = settings
+    if not message:
+        yield _outputs(transcript, app_state, settings, ui, files)
+        return
+    _task, slash, shell_cmd = parse_input(
+        message,
+        workspace_files=files,
+        workspace=settings.workspace,
+        rust=app_state.rust,
+    )
+    if shell_cmd:
+        rust = _ensure_rust(app_state, settings)
+        out = rust.run_shell(shell_cmd)
+        transcript.append_user(f"!{shell_cmd}")
+        transcript.append_info(out)
+        yield _outputs(transcript, app_state, settings, ui, files)
+        return
+    if slash:
+        if slash.startswith("/search "):
+            q = slash.split(maxsplit=1)[1]
+            hits = transcript.search(q)
+            transcript.append_user(slash)
+            transcript.append_info("\n".join(hits) if hits else f"no matches for '{q}'")
+            yield _outputs(transcript, app_state, settings, ui, files)
+            return
+        sr = dispatch_slash(slash, app_state)
+        _apply_slash_ui(sr, settings, ui, transcript)
+        if sr.clear_chat:
+            transcript.clear()
+        if sr.reply:
+            transcript.append_user(slash)
+            plain = sr.reply.replace("**", "").replace("`", "")
+            transcript.append_info(plain)
+        if sr.queued_task:
+            async for out in _run_agent_turn(sr.queued_task, transcript, app_state, settings, ui, files):
+                yield out
+            return
+        yield _outputs(transcript, app_state, settings, ui, files, trace_path=sr.download_path)
+        return
+    async for out in _run_agent_turn(_task, transcript, app_state, settings, ui, files):
+        yield out
+def on_interrupt(app_state: AppSessionState):
+    if app_state.rust:
+        app_state.rust.request_cancel()
+    return app_state
+def on_clear(transcript: Transcript, ui: WebUiState):
+    transcript.clear()
+    ui.overlay = ""
+    ui.picker_kind = ""
+    ui.picker_items = []
+    ui.picker_sel = 0
+    return transcript, ui, ""
+def on_close_overlay(ui: WebUiState):
+    ui.overlay = ""
+    ui.picker_kind = ""
+    ui.picker_items = []
+    ui.picker_sel = 0
+    return ui, gr.update(value="", visible=False)
+def on_open_picker(kind: str, ui: WebUiState, settings: UiSettings):
+    ui.overlay = "picker"
+    ui.picker_kind = kind
+    ui.picker_items = _picker_items(kind, settings)
+    ui.picker_sel = _picker_sel_for(kind, settings, ui, ui.picker_items)
+    val = _overlay_html(ui)
+    return ui, gr.update(value=val, visible=True)
+def on_picker_nav(delta: int, ui: WebUiState):
+    if ui.picker_items:
+        ui.picker_sel = max(0, min(len(ui.picker_items) - 1, ui.picker_sel + delta))
+    val = _overlay_html(ui)
+    return ui, gr.update(value=val, visible=bool(val))
+def on_picker_select(
+    pick_idx: str,
+    ui: WebUiState,
+    settings: UiSettings,
+    app_state: AppSessionState,
+    transcript: Transcript,
+    files: list[str],
+):
+    try:
+        idx = int(pick_idx) if pick_idx else ui.picker_sel
+    except ValueError:
+        idx = ui.picker_sel
+    kind = ui.picker_kind
+    items = ui.picker_items
+    if items:
+        idx = max(0, min(len(items) - 1, idx))
+        item = items[idx]
+        if kind == "models":
+            sel = _label_to_selection(item)
+            if sel:
+                settings.model, settings.think = sel
+            ui.needs_model_pick = False
+            transcript.append_info(f"model → {item}")
+        elif kind == "themes":
+            if item in theme_names():
+                ui.theme_idx = theme_names().index(item)
+            transcript.append_info(f"theme → {item}")
+        elif kind == "agents":
+            if settings.mode != "plan":
+                settings.agent = item
+                transcript.append_info(f"agent → {item}")
+        elif kind == "sessions":
+            sid = parse_session_label(item)
+            if sid:
+                rust = RustSession(workspace=settings.workspace, agent=settings.agent, yolo=settings.yolo)
+                if rust.load_session(sid):
+                    app_state.rust = rust
+                    transcript.clear()
+                    transcript.from_stored_chat(get_session_chat(sid))
+                    transcript.append_info(f"loaded session {sid[:8]}")
+    ui.overlay = ""
+    ui.picker_kind = ""
+    ui.picker_items = []
+    ui.picker_sel = 0
+    overlay_val = _overlay_html(ui)
+    return (
+        transcript.render_html(),
+        _header(settings, ui),
+        _status(settings, app_state),
+        gr.update(value=overlay_val, visible=False),
+        shell_theme_html(ui.theme_idx),
+        settings,
+        ui,
+        transcript,
+        app_state,
+    )
+def _cycle_outputs(
+    settings: UiSettings,
+    ui: WebUiState,
+    app_state: AppSessionState,
+    transcript: Transcript,
+):
+    return (
+        settings,
+        transcript,
+        transcript.render_html(),
+        _header(settings, ui),
+        _status(settings, app_state),
+        shell_theme_html(ui.theme_idx),
+    )
+def on_toggle_sidebar(ui: WebUiState, settings: UiSettings, files: list[str], app_state: AppSessionState):
+    ui.sidebar_visible = not ui.sidebar_visible
+    return ui, gr.update(
+        value=_sidebar_html(ui, settings, files, app_state),
+        visible=ui.sidebar_visible,
+    )
+def on_toggle_sidebar_view(
+    ui: WebUiState, settings: UiSettings, files: list[str], app_state: AppSessionState,
+):
+    ui.sidebar_view = "stats" if ui.sidebar_view == "files" else "files"
+    return ui, gr.update(value=_sidebar_html(ui, settings, files, app_state))
+def on_load(settings: UiSettings, app_state: AppSessionState, ui: WebUiState):
+    paths, total = workspace_paths(settings.workspace)
+    ui.file_total = total
+    overlay_val = ""
+    if ui.needs_model_pick:  # blocking startup model picker
+        ui.overlay = "picker"
+        ui.picker_kind = "models"
+        ui.picker_items = _model_labels()
+        ui.picker_sel = _model_sel_index(settings)
+        overlay_val = _overlay_html(ui)
+    return (
+        _sidebar_html(ui, settings, paths, app_state),
+        paths,
+        _embed_js(settings, paths),
+        gr.update(choices=session_choices()),
+        gr.update(value=overlay_val, visible=bool(overlay_val)),
+        ui,
+    )
+def on_cycle_mode(settings: UiSettings, ui: WebUiState, app_state: AppSessionState, transcript: Transcript):
+    settings.mode = cycle_mode(settings.mode)
+    if settings.mode == "plan":
+        settings.agent = "plan"
+    elif settings.agent == "plan":
+        settings.agent = "build"
+    settings.yolo = settings.mode == "auto"
+    transcript.append_info(f"mode → {settings.mode}")
+    return _cycle_outputs(settings, ui, app_state, transcript)
+def on_cycle_agent(settings: UiSettings, ui: WebUiState, app_state: AppSessionState, transcript: Transcript):
+    if settings.mode != "plan":
+        settings.agent = cycle_agent(settings.agent)
+        transcript.append_info(f"agent → {settings.agent}")
+    return _cycle_outputs(settings, ui, app_state, transcript)
+def on_cycle_model(settings: UiSettings, ui: WebUiState, app_state: AppSessionState, transcript: Transcript):
+    labels = _model_labels()
+    nxt = cycle_model(labels, _selection_label(settings))
+    sel = _label_to_selection(nxt)
+    if sel:
+        settings.model, settings.think = sel
+    ui.needs_model_pick = False
+    transcript.append_info(f"model → {nxt}")
+    return _cycle_outputs(settings, ui, app_state, transcript)
+def on_cycle_think(settings: UiSettings, ui: WebUiState, app_state: AppSessionState, transcript: Transcript):
+    settings.think = cycle_think(settings.think)
+    transcript.append_info(f"think → {settings.think}")
+    return _cycle_outputs(settings, ui, app_state, transcript)
+def on_help(ui: WebUiState):
+    ui.overlay = "help"
+    val = _overlay_html(ui)
+    return ui, gr.update(value=val, visible=True)
+def on_whichkey(ui: WebUiState):
+    ui.overlay = "whichkey"
+    val = _overlay_html(ui)
+    return ui, gr.update(value=val, visible=True)
+def on_new_session():
+    settings = UiSettings(workspace=os.environ.get("SMALLCODE_WORKSPACE", "."), model="auto")
+    ui = WebUiState()  # needs_model_pick defaults True -> reopen the blocking picker
+    ui.overlay = "picker"
+    ui.picker_kind = "models"
+    ui.picker_items = _model_labels()
+    ui.picker_sel = _model_sel_index(settings)
+    return (
+        Transcript(), AppSessionState(), settings, ui, [], None,
+        gr.update(value=_overlay_html(ui), visible=True),
+    )
+def on_approval(yes: bool, app_state: AppSessionState):
+    if app_state:
+        app_state.approval.approve(yes)
+    return gr.update(visible=False), ""
+def on_session_pick(label: str, app_state: AppSessionState, settings: UiSettings):
+    sid = parse_session_label(label or "")
+    if not sid:
+        return Transcript(), app_state
+    rust = RustSession(workspace=settings.workspace, agent=settings.agent, yolo=settings.yolo)
+    if not rust.load_session(sid):
+        return Transcript(), app_state
+    app_state.rust = rust
+    t = Transcript()
+    t.from_stored_chat(get_session_chat(sid))
+    return t, app_state
+def build() -> gr.Blocks:
+    default_ws = os.environ.get("SMALLCODE_WORKSPACE", ".")
+    # Default selection is Auto (router-driven); the blocking startup modal lets the
+    # user confirm or change it before the first task.
+    settings = UiSettings(workspace=default_ws, model="auto")
+    with gr.Blocks(
+        css=SMOLCODE_CSS,
+        title="smolcode",
+        theme=gr.themes.Soft(primary_hue="purple", neutral_hue="slate"),
+        head=f"<script>{_JS_HEAD}\n{_js_boot_lines(settings, [])}</script>",
+        fill_height=True,
+        fill_width=True,
+    ) as demo:
+        transcript = gr.State(Transcript())
+        app_state = gr.State(AppSessionState(settings=settings))
+        settings_state = gr.State(settings)
+        ui_state = gr.State(WebUiState())
+        files_state = gr.State([])
+        trace_state = gr.State(None)
+        with gr.Column(elem_classes="sc-tui-shell"):
+            header = gr.HTML(_header(settings, WebUiState()))
+            shell_theme = gr.HTML(shell_theme_html(0), visible=False)
+            with gr.Row(elem_classes="sc-main-row"):
+                sidebar = gr.HTML(
+                    _sidebar_html(WebUiState(), settings, [], AppSessionState()),
+                    elem_classes="sc-sidebar",
+                    visible=True,
+                )
+                with gr.Column(elem_classes="sc-main-col"):
+                    transcript_view = gr.HTML(Transcript().render_html())
+                    with gr.Group(elem_classes="sc-editor-wrap"):
+                        gr.HTML(
+                            '<div class="sc-editor-hint">'
+                            "Enter run · Shift+Enter newline · / commands · ctrl+x leader"
+                            "</div>"
+                        )
+                        editor = gr.Textbox(
+                            placeholder="type a task…",
+                            lines=5,
+                            max_lines=8,
+                            show_label=False,
+                            elem_id="sc-editor",
+                            interactive=True,
+                            autofocus=True,
+                        )
+                    with gr.Group(visible=False) as approval_box:
+                        approval_desc = gr.Markdown("", elem_classes="sc-approval")
+                        with gr.Row():
+                            gr.Button("Approve", variant="primary").click(
+                                lambda s: on_approval(True, s), app_state, [approval_box, approval_desc])
+                            gr.Button("Deny").click(
+                                lambda s: on_approval(False, s), app_state, [approval_box, approval_desc])
+            status = gr.HTML(_status(settings, AppSessionState()), elem_classes="sc-status-wrap")
+        overlay = gr.HTML("", visible=False)
+        js_boot = gr.HTML(_embed_js(settings, []), elem_classes=["sc-hidden-controls"])
+        # Off-screen controls (visible=True so Gradio mounts them for JS shortcuts).
+        _hid = ["sc-hidden-btn"]
+        with gr.Row(elem_classes="sc-hidden-controls"):
+            btn_submit = gr.Button("submit", elem_id="sc-submit", elem_classes=_hid)
+            btn_clear = gr.Button("clear", elem_id="sc-clear", elem_classes=_hid)
+            btn_interrupt = gr.Button("interrupt", elem_id="sc-interrupt", elem_classes=_hid)
+            btn_toggle_sidebar = gr.Button("sidebar", elem_id="sc-toggle-sidebar", elem_classes=_hid)
+            btn_toggle_view = gr.Button("view", elem_id="sc-toggle-sidebar-view", elem_classes=_hid)
+            btn_cycle_mode = gr.Button("mode", elem_id="sc-cycle-mode", elem_classes=_hid)
+            btn_cycle_agent = gr.Button("agent", elem_id="sc-cycle-agent", elem_classes=_hid)
+            btn_cycle_model = gr.Button("model", elem_id="sc-cycle-model", elem_classes=_hid)
+            btn_cycle_think = gr.Button("think", elem_id="sc-cycle-think", elem_classes=_hid)
+            btn_help = gr.Button("help", elem_id="sc-help", elem_classes=_hid)
+            btn_whichkey = gr.Button("wk", elem_id="sc-whichkey", elem_classes=_hid)
+            btn_close = gr.Button("close", elem_id="sc-close-overlay", elem_classes=_hid)
+            btn_new = gr.Button("new", elem_id="sc-new-session", elem_classes=_hid)
+            btn_open_models = gr.Button("models", elem_id="sc-open-picker-models", elem_classes=_hid)
+            btn_open_themes = gr.Button("themes", elem_id="sc-open-picker-themes", elem_classes=_hid)
+            btn_open_agents = gr.Button("agents", elem_id="sc-open-picker-agents", elem_classes=_hid)
+            btn_open_sessions = gr.Button("sessions", elem_id="sc-open-picker-sessions", elem_classes=_hid)
+            btn_picker_up = gr.Button("up", elem_id="sc-picker-up", elem_classes=_hid)
+            btn_picker_down = gr.Button("down", elem_id="sc-picker-down", elem_classes=_hid)
+            btn_picker_confirm = gr.Button("confirm", elem_id="sc-picker-confirm", elem_classes=_hid)
+            picker_pick = gr.Textbox("", elem_id="sc-picker-pick", elem_classes=_hid, show_label=False)
+        session_pick = gr.Dropdown(choices=session_choices(), label="session", elem_id="sc-pick-sessions", elem_classes=_hid)
+        trace_dl = gr.DownloadButton("trace", elem_classes=_hid)
+        out = [
+            transcript_view, header, status, sidebar,
+            overlay, shell_theme, approval_box, approval_desc,
+            files_state, trace_state, app_state, settings_state, ui_state, transcript, editor,
+        ]
+        cycle_out = [
+            settings_state, transcript, transcript_view, header, status, shell_theme,
+        ]
+        picker_out = [
+            transcript_view, header, status, overlay, shell_theme,
+            settings_state, ui_state, transcript, app_state,
+        ]
+        respond_in = [editor, transcript, app_state, settings_state, ui_state, files_state]
+        btn_submit.click(respond, respond_in, out).then(lambda p: p, trace_state, trace_dl)
+        editor.submit(respond, respond_in, out).then(lambda p: p, trace_state, trace_dl)
+        btn_clear.click(on_clear, [transcript, ui_state], [transcript, ui_state, editor])
+        btn_interrupt.click(on_interrupt, app_state, app_state)
+        btn_toggle_sidebar.click(
+            on_toggle_sidebar, [ui_state, settings_state, files_state, app_state], [ui_state, sidebar])
+        btn_toggle_view.click(
+            on_toggle_sidebar_view,
+            [ui_state, settings_state, files_state, app_state],
+            [ui_state, sidebar],
+        )
+        btn_cycle_mode.click(
+            on_cycle_mode, [settings_state, ui_state, app_state, transcript], cycle_out)
+        btn_cycle_agent.click(
+            on_cycle_agent, [settings_state, ui_state, app_state, transcript], cycle_out)
+        btn_cycle_model.click(
+            on_cycle_model, [settings_state, ui_state, app_state, transcript], cycle_out)
+        btn_cycle_think.click(
+            on_cycle_think, [settings_state, ui_state, app_state, transcript], cycle_out)
+        btn_help.click(on_help, ui_state, [ui_state, overlay])
+        btn_whichkey.click(on_whichkey, ui_state, [ui_state, overlay])
+        btn_close.click(on_close_overlay, ui_state, [ui_state, overlay])
+        btn_new.click(on_new_session, None, [transcript, app_state, settings_state, ui_state, files_state, trace_state, overlay])
+        btn_open_models.click(lambda ui, s: on_open_picker("models", ui, s), [ui_state, settings_state], [ui_state, overlay])
+        btn_open_themes.click(lambda ui, s: on_open_picker("themes", ui, s), [ui_state, settings_state], [ui_state, overlay])
+        btn_open_agents.click(lambda ui, s: on_open_picker("agents", ui, s), [ui_state, settings_state], [ui_state, overlay])
+        btn_open_sessions.click(lambda ui, s: on_open_picker("sessions", ui, s), [ui_state, settings_state], [ui_state, overlay])
+        btn_picker_up.click(lambda ui: on_picker_nav(-1, ui), ui_state, [ui_state, overlay])
+        btn_picker_down.click(lambda ui: on_picker_nav(1, ui), ui_state, [ui_state, overlay])
+        btn_picker_confirm.click(
+            on_picker_select,
+            [picker_pick, ui_state, settings_state, app_state, transcript, files_state],
+            picker_out,
+        )
+        session_pick.change(on_session_pick, [session_pick, app_state, settings_state], [transcript, app_state])
+        demo.load(
+            on_load,
+            [settings_state, app_state, ui_state],
+            [sidebar, files_state, js_boot, session_pick, overlay, ui_state],
+        )
+    return demo
+if __name__ == "__main__":
+    from engine.preflight import preflight
+    preflight(PRESET)
+    host = os.environ.get("SMOLCODE_HOST", "127.0.0.1")
+    os.environ["GRADIO_SERVER_PORT"] = os.environ.get("SMOLCODE_PORT", "7860")
+    os.environ["GRADIO_SERVER_NAME"] = host
+    # server_port=None lets Gradio scan GRADIO_SERVER_PORT..+99 (skips ghost 7860-7862).
+    # ssr_mode=False: SSR (default on HF when Node is present) renders before the
+    # custom web_tui.js applies the fixed-height layout, leaving the file sidebar
+    # uncapped (grows forever, hides the bottom bar/model picker). Client-side render
+    # applies the layout immediately.
+    build().queue().launch(server_name=host, server_port=None, show_api=False,
+                           ssr_mode=False)

demo.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d786d4033bd453a36291aeb17f5999f5ca579c9553762d25bf72770b5d37c165
+size 5896625

engine/__init__.py ADDED Viewed

	@@ -0,0 +1,22 @@

+"""smolcode engine package."""
+from .agent import SmallCodeAgent, Step
+from .builder import BuildResult, WebBuilder
+from .config import (
+    Preset,
+    SpecialistLadder,
+    SpecialistPreset,
+    Tier,
+    default_ui_model,
+    load_preset,
+)
+from .fanout import FanoutResult, fan_out, fan_out_live, summarize
+from .preview import inline_app, preview_iframe
+from .router import Router, RouteResult, classify_specialty, classify_tier
+from .rust_session import RustSession, rust_available
+__all__ = ["SmallCodeAgent", "Step", "Preset", "Tier", "load_preset", "default_ui_model",
+           "SpecialistLadder", "SpecialistPreset",
+           "Router", "RouteResult", "classify_tier", "classify_specialty",
+           "FanoutResult", "fan_out", "fan_out_live", "summarize",
+           "WebBuilder", "BuildResult", "inline_app", "preview_iframe",
+           "RustSession", "rust_available"]

engine/agent.py ADDED Viewed

	@@ -0,0 +1,197 @@

+"""smolcode agent engine — backed by the Rust smolcode_core agent loop."""
+from __future__ import annotations
+import asyncio
+import os
+import tempfile
+from collections.abc import Callable
+from dataclasses import dataclass
+from .config import Preset, load_preset
+from .rust_session import RustRunResult, RustSession, rust_available
+from .sandbox import Workspace
+from .trace_collector import TraceCollector
+# Legacy prompt kept for docs; Rust agent uses prompts.rs system prompts.
+SYSTEM_PROMPT = """You are smolcode, a precise coding assistant running on a small local model."""
+@dataclass
+class Step:
+    number: int
+    kind: str
+    detail: str
+    total_tokens: int | None = None
+class SmallCodeAgent:
+    """Agent facade: uses the Rust engine when smolcode_core is installed."""
+    def __init__(
+        self,
+        preset: Preset | None = None,
+        model: str | None = None,
+        max_steps: int = 12,
+        *,
+        system_prompt: str | None = None,
+        registry_builder: Callable | None = None,
+        workspace: Workspace | None = None,
+        name: str = "smolcode",
+        agent: str = "build",
+        profile: str = "full",
+        yolo: bool = False,
+        workspace_dir: str | None = None,
+        approval_handler=None,
+        rust_session: RustSession | None = None,
+    ) -> None:
+        self.preset = preset or load_preset()
+        self.model = model or self.preset.default_model
+        self.max_steps = max_steps
+        self._system_prompt = system_prompt  # unused by Rust; kept for API compat
+        self._registry_builder = registry_builder
+        self.hit_max_steps = False
+        self.errored = False
+        ws_path = workspace_dir or os.environ.get("SMALLCODE_WORKSPACE")
+        if workspace is not None:
+            ws_path = str(workspace.root)
+        elif ws_path is None:
+            ws_path = tempfile.mkdtemp(prefix="smallcode-")
+            self._owns_workspace = True
+        else:
+            self._owns_workspace = False
+        self.workspace = workspace or Workspace(root=ws_path)
+        profile_name = profile
+        if registry_builder is not None:
+            profile_name = "web"
+        if not rust_available():
+            raise RuntimeError(
+                "smolcode_core required; install with maturin in smolcode-cli/crates/smolcode-py"
+            )
+        if rust_session is not None:
+            self._rust = rust_session
+        else:
+            self._rust = RustSession(
+                workspace=ws_path,
+                agent=agent,
+                yolo=yolo,
+                model=self.model,
+                base_url=self.preset.base_url,
+                api_key=self.preset.api_key,
+                profile=profile_name,
+                approval_handler=approval_handler,
+            )
+        self.trace_collector = self._rust.trace_collector
+        if registry_builder is not None:
+            self._register_web_tools()
+    def _register_web_tools(self) -> None:
+        from .tools import check_app_impl
+        ws = self.workspace
+        collector = self.trace_collector
+        def check_app(args: dict) -> dict:
+            return check_app_impl(ws, collector, args)
+        self._rust.register_tool("check_app", check_app)
+    async def run(self, task: str, *, think: str | None = None, yolo: bool | None = None) -> tuple[str, list[Step]]:
+        self.hit_max_steps = False
+        self.errored = False
+        result: RustRunResult = await self._rust.run(task, think=think, yolo=yolo)
+        self.hit_max_steps = result.hit_max_steps
+        self.errored = result.errored
+        steps = self._steps_from_trace()
+        return result.final, steps
+    async def run_live_turn(
+        self,
+        task: str,
+        *,
+        think: str | None = None,
+        yolo: bool | None = None,
+        poll_interval: float = 0.35,
+    ):
+        """Async generator yielding LiveFrame snapshots during a Rust agent turn."""
+        from .live_run import LiveFrame
+        self.hit_max_steps = False
+        self.errored = False
+        self.trace_collector.events.clear()
+        self._rust.clear_cancel()
+        self._rust._session.start_turn(task, think=think, yolo=yolo)
+        final_text = ""
+        done = False
+        interrupted = False
+        while not done:
+            if self._rust.cancelled:
+                interrupted = True
+                done = True
+                break
+            ev = await asyncio.to_thread(self._rust._session.poll_event)
+            if ev is None:
+                yield LiveFrame(
+                    events=self.trace_collector.snapshot(),
+                    files=self.files(),
+                )
+                await asyncio.sleep(poll_interval)
+                continue
+            kind = ev.get("kind")
+            if kind == "approval":
+                approved = True
+                if self._rust.approval_handler is not None:
+                    approved = await self._rust.approval_handler(ev.get("desc", ""))
+                self._rust._session.approve(approved)
+                continue
+            self._rust._ingest_event(ev)
+            if kind == "final":
+                final_text = ev.get("text", "")
+            if kind == "done":
+                done = True
+            yield LiveFrame(
+                events=self.trace_collector.snapshot(),
+                files=self.files(),
+                raw_event=ev,
+            )
+        if interrupted:
+            final_text = final_text or "interrupted"
+            self.errored = True
+        if final_text and not interrupted:
+            self._rust._session.record_turn(task, final_text)
+        steps = self._steps_from_trace()
+        yield LiveFrame(
+            steps=steps,
+            events=self.trace_collector.snapshot(),
+            files=self.files(),
+            done=True,
+            result=(final_text, steps),
+        )
+    def _steps_from_trace(self) -> list[Step]:
+        out: list[Step] = []
+        for i, ev in enumerate(self.trace_collector.events):
+            out.append(Step(number=i, kind=ev.kind, detail=ev.detail))
+        return out
+    def current_steps(self) -> list[Step]:
+        return self._steps_from_trace()
+    def raw_history(self) -> list:
+        return self.current_steps()
+    def files(self) -> dict[str, str]:
+        return self._rust.files()
+    @property
+    def rust_session(self) -> RustSession:
+        return self._rust
+    def cleanup(self) -> None:
+        if getattr(self, "_owns_workspace", False):
+            self.workspace.cleanup()

engine/branding.py ADDED Viewed

	@@ -0,0 +1,172 @@

+"""Shared Hugging Face branding for smolcode Gradio UIs."""
+from __future__ import annotations
+from .themes import theme_css_vars
+# Official HF icon (huggingface.co/front/assets/huggingface_logo-noborder.svg)
+HF_LOGO_SVG = (
+    '<svg class="hf-logo" xmlns="http://www.w3.org/2000/svg" width="32" height="30" '
+    'viewBox="0 0 95 88" fill="none" aria-label="Hugging Face">'
+    '<path fill="#FFD21E" d="M47.21 76.5a34.75 34.75 0 1 0 0-69.5 34.75 34.75 0 0 0 0 69.5Z" />'
+    '<path fill="#FF9D0B" d="M81.96 41.75a34.75 34.75 0 1 0-69.5 0 34.75 34.75 0 0 0 69.5 0Zm-73.5 0a38.75 38.75 0 1 1 77.5 0 38.75 38.75 0 0 1-77.5 0Z" />'
+    '<path fill="#3A3B45" d="M58.5 32.3c1.28.44 1.78 3.06 3.07 2.38a5 5 0 1 0-6.76-2.07c.61 1.15 2.55-.72 3.7-.32ZM34.95 32.3c-1.28.44-1.79 3.06-3.07 2.38a5 5 0 1 1 6.76-2.07c-.61 1.15-2.56-.72-3.7-.32Z" />'
+    '<path fill="#FF323D" d="M46.96 56.29c9.83 0 13-8.76 13-13.26 0-2.34-1.57-1.6-4.09-.36-2.33 1.15-5.46 2.74-8.9 2.74-7.19 0-13-6.88-13-2.38s3.16 13.26 13 13.26Z" />'
+    '<path fill="#3A3B45" fill-rule="evenodd" d="M39.43 54a8.7 8.7 0 0 1 5.3-4.49c.4-.12.81.57 1.24 1.28.4.68.82 1.37 1.24 1.37.45 0 .9-.68 1.33-1.35.45-.7.89-1.38 1.32-1.25a8.61 8.61 0 0 1 5 4.17c3.73-2.94 5.1-7.74 5.1-10.7 0-2.34-1.57-1.6-4.09-.36l-.14.07c-2.31 1.15-5.39 2.67-8.77 2.67s-6.45-1.52-8.77-2.67c-2.6-1.29-4.23-2.1-4.23.29 0 3.05 1.46 8.06 5.47 10.97Z" clip-rule="evenodd" />'
+    '<path fill="#FF9D0B" d="M70.71 37a3.25 3.25 0 1 0 0-6.5 3.25 3.25 0 0 0 0 6.5ZM24.21 37a3.25 3.25 0 1 0 0-6.5 3.25 3.25 0 0 0 0 6.5ZM17.52 48c-1.62 0-3.06.66-4.07 1.87a5.97 5.97 0 0 0-1.33 3.76 7.1 7.1 0 0 0-1.94-.3c-1.55 0-2.95.59-3.94 1.66a5.8 5.8 0 0 0-.8 7 5.3 5.3 0 0 0-1.79 2.82c-.24.9-.48 2.8.8 4.74a5.22 5.22 0 0 0-.37 5.02c1.02 2.32 3.57 4.14 8.52 6.1 3.07 1.22 5.89 2 5.91 2.01a44.33 44.33 0 0 0 10.93 1.6c5.86 0 10.05-1.8 12.46-5.34 3.88-5.69 3.33-10.9-1.7-15.92-2.77-2.78-4.62-6.87-5-7.77-.78-2.66-2.84-5.62-6.25-5.62a5.7 5.7 0 0 0-4.6 2.46c-1-1.26-1.98-2.25-2.86-2.82A7.4 7.4 0 0 0 17.52 48Zm0 4c.51 0 1.14.22 1.82.65 2.14 1.36 6.25 8.43 7.76 11.18.5.92 1.37 1.31 2.14 1.31 1.55 0 2.75-1.53.15-3.48-3.92-2.93-2.55-7.72-.68-8.01.08-.02.17-.02.24-.02 1.7 0 2.45 2.93 2.45 2.93s2.2 5.52 5.98 9.3c3.77 3.77 3.97 6.8 1.22 10.83-1.88 2.75-5.47 3.58-9.16 3.58-3.81 0-7.73-.9-9.92-1.46-.11-.03-13.45-3.8-11.76-7 .28-.54.75-.76 1.34-.76 2.38 0 6.7 3.54 8.57 3.54.41 0 .7-.17.83-.6.79-2.85-12.06-4.05-10.98-8.17.2-.73.71-1.02 1.44-1.02 3.14 0 10.2 5.53 11.68 5.53.11 0 .2-.03.24-.1.74-1.2.33-2.04-4.9-5.2-5.21-3.16-8.88-5.06-6.8-7.33.24-.26.58-.38 1-.38 3.17 0 10.66 6.82 10.66 6.82s2.02 2.1 3.25 2.1c.28 0 .52-.1.68-.38.86-1.46-8.06-8.22-8.56-11.01-.34-1.9.24-2.85 1.31-2.85Z" />'
+    '<path fill="#FFD21E" d="M38.6 76.69c2.75-4.04 2.55-7.07-1.22-10.84-3.78-3.77-5.98-9.3-5.98-9.3s-.82-3.2-2.69-2.9c-1.87.3-3.24 5.08.68 8.01 3.91 2.93-.78 4.92-2.29 2.17-1.5-2.75-5.62-9.82-7.76-11.18-2.13-1.35-3.63-.6-3.13 2.2.5 2.79 9.43 9.55 8.56 11-.87 1.47-3.93-1.71-3.93-1.71s-9.57-8.71-11.66-6.44c-2.08 2.27 1.59 4.17 6.8 7.33 5.23 3.16 5.64 4 4.9 5.2-.75 1.2-12.28-8.53-13.36-4.4-1.08 4.11 11.77 5.3 10.98 8.15-.8 2.85-9.06-5.38-10.74-2.18-1.7 3.21 11.65 6.98 11.76 7.01 4.3 1.12 15.25 3.49 19.08-2.12Z" />'
+    '<path fill="#FF9D0B" d="M77.4 48c1.62 0 3.07.66 4.07 1.87a5.97 5.97 0 0 1 1.33 3.76 7.1 7.1 0 0 1 1.95-.3c1.55 0 2.95.59 3.94 1.66a5.8 5.8 0 0 1 .8 7 5.3 5.3 0 0 1 1.78 2.82c.24.9.48 2.8-.8 4.74a5.22 5.22 0 0 1 .37 5.02c-1.02 2.32-3.57 4.14-8.51 6.1-3.08 1.22-5.9 2-5.92 2.01a44.33 44.33 0 0 1-10.93 1.6c-5.86 0-10.05-1.8-12.46-5.34-3.88-5.69-3.33-10.9 1.7-15.92 2.78-2.78 4.63-6.87 5.01-7.77.78-2.66 2.83-5.62 6.24-5.62a5.7 5.7 0 0 1 4.6 2.46c1-1.26 1.98-2.25 2.87-2.82A7.4 7.4 0 0 1 77.4 48Zm0 4c-.51 0-1.13.22-1.82.65-2.13 1.36-6.25 8.43-7.76 11.18a2.43 2.43 0 0 1-2.14 1.31c-1.54 0-2.75-1.53-.14-3.48 3.91-2.93 2.54-7.72.67-8.01a1.54 1.54 0 0 0-.24-.02c-1.7 0-2.45 2.93-2.45 2.93s-2.2 5.52-5.97 9.3c-3.78 3.77-3.98 6.8-1.22 10.83 1.87 2.75 5.47 3.58 9.15 3.58 3.82 0 7.73-.9 9.93-1.46.1-.03 13.45-3.8 11.76-7-.29-.54-.75-.76-1.34-.76-2.38 0-6.71 3.54-8.57 3.54-.42 0-.71-.17-.83-.6-.8-2.85 12.05-4.05 10.97-8.17-.19-.73-.7-1.02-1.44-1.02-3.14 0-10.2 5.53-11.68 5.53-.1 0-.19-.03-.23-.1-.74-1.2-.34-2.04 4.88-5.2 5.23-3.16 8.9-5.06 6.8-7.33-.23-.26-.57-.38-.98-.38-3.18 0-10.67 6.82-10.67 6.82s-2.02 2.1-3.24 2.1a.74.74 0 0 1-.68-.38c-.87-1.46 8.05-8.22 8.55-11.01.34-1.9-.24-2.85-1.31-2.85Z" />'
+    '<path fill="#FFD21E" d="M56.33 76.69c-2.75-4.04-2.56-7.07 1.22-10.84 3.77-3.77 5.97-9.3 5.97-9.3s.82-3.2 2.7-2.9c1.86.3 3.23 5.08-.68 8.01-3.92 2.93.78 4.92 2.28 2.17 1.51-2.75 5.63-9.82 7.76-11.18 2.13-1.35 3.64-.6 3.13 2.2-.5 2.79-9.42 9.55-8.55 11 .86 1.47 3.92-1.71 3.92-1.71s9.58-8.71 11.66-6.44c2.08 2.27-1.58 4.17-6.8 7.33-5.23 3.16-5.63 4-4.9 5.2.75 1.2 12.28-8.53 13.36-4.4 1.08 4.11-11.76 5.3-10.97 8.15.8 2.85 9.05-5.38 10.74-2.18 1.69 3.21-11.65 6.98-11.76 7.01-4.31 1.12-15.26 3.49-19.08-2.12Z" />'
+    '</svg>'
+)
+SMOLCODE_CSS = """
+:root { --hf-yellow:#FFD21E; --sc-accent:#7c3aed; --sc-bg:#0b1020; --sc-panel:#111827;
+  --sc-border:#334155; --sc-fg:#e2e8f0; --sc-dim:#64748b; --sc-ok:#34d399; --sc-tool:#a78bfa; }
+body, .gradio-container { background:var(--sc-bg) !important; color:var(--sc-fg) !important; }
+/* Lock the whole page to the viewport so it can NEVER scroll; only inner panes scroll. */
+html, body { height:100% !important; max-height:100vh !important; margin:0 !important;
+  overflow:hidden !important; }
+gradio-app { display:block !important; height:100vh !important; max-height:100vh !important;
+  overflow:hidden !important; }
+.gradio-container { max-width:100% !important; padding:0.5rem 1rem !important;
+  height:100vh !important; max-height:100vh !important; min-height:0 !important;
+  overflow:hidden !important; }
+/* Every Gradio wrapper between the container and our shell must be height-locked, not auto. */
+.gradio-container > .wrap, .gradio-container .contain,
+main.fillable, main.app, .gradio-container > main {
+  height:100% !important; max-height:100% !important; min-height:0 !important;
+  overflow:hidden !important; }
+/* The unnamed outer column Gradio injects around our shell column. */
+main.fillable > .column, .contain > .column, .wrap > .column {
+  height:100% !important; max-height:100% !important; min-height:0 !important;
+  overflow:hidden !important; }
+.sc-header { display:flex; align-items:center; gap:.75rem; margin-bottom:.25rem; }
+.hf-logo { flex-shrink:0; }
+.sc-title { font-weight:800; font-size:1.7rem; letter-spacing:-.02em; line-height:1.2; }
+.sc-title .hf-accent, .hf-accent { color:var(--hf-yellow); }
+.sc-badge { display:inline-block; padding:2px 10px; border-radius:999px;
+  background:#2a2410; color:var(--hf-yellow); border:1px solid rgba(255,210,30,.25);
+  font-size:.72rem; font-weight:600; margin-left:.4rem; vertical-align:middle; }
+.sc-sub { color:#94a3b8; margin-top:.2rem; font-size:.9rem; }
+.sc-tui-shell { display:flex !important; flex-direction:column; gap:.5rem;
+  height:100% !important; max-height:100% !important; min-height:0; overflow:hidden !important; }
+.sc-header-bar { display:flex; align-items:center; gap:.85rem; padding:.5rem .75rem;
+  background:#1e293b; border-radius:6px; font-family:ui-monospace,monospace; font-size:.8rem;
+  flex-shrink:0; }
+.sc-hbrand { font-weight:700; color:#0b1020; background:var(--sc-accent); padding:1px 8px;
+  border-radius:4px; }
+.sc-hbrand .hf-accent { color:var(--hf-yellow); }
+.sc-hgit { color:var(--sc-ok); }
+.sc-hmodel { color:var(--sc-tool); font-weight:700; }
+.sc-hhost { color:var(--sc-dim); }
+.sc-htheme { color:var(--sc-dim); margin-left:auto; }
+.sc-main-row { display:flex !important; flex-wrap:nowrap !important; align-items:stretch !important;
+  gap:.5rem !important; flex:1 !important; min-height:0 !important; overflow:hidden !important; }
+.sc-main-row > .gr-html, .sc-main-row > .gr-column { min-height:0 !important; height:100% !important; }
+.sc-sidebar { width:17rem !important; min-width:17rem !important; max-width:17rem !important;
+  flex-shrink:0 !important; height:100% !important; min-height:0 !important; overflow:hidden !important; }
+.sc-sidebar > .html-container { padding:0 !important; height:100% !important; min-height:0 !important; }
+.sc-sidebar-panel { height:100%; min-height:0; max-height:100%; display:flex; flex-direction:column;
+  background:var(--sc-panel); border:1px solid var(--sc-border); border-radius:8px;
+  font-family:ui-monospace,monospace; font-size:.78rem; overflow:hidden; }
+.sc-sidebar-focused { border-color:var(--sc-accent); }
+.sc-sidebar-title { padding:.35rem .55rem; color:var(--sc-accent); font-weight:700;
+  border-bottom:1px solid var(--sc-border); background:#0f172a; }
+.sc-sidebar-body { flex:1 1 0%; min-height:0; height:100%;
+  max-height:calc(100vh - 5rem); overflow-y:auto; overflow-x:hidden;
+  padding:.25rem 0; line-height:1.35; }
+.sc-sb-dir { color:var(--sc-accent); font-weight:700; padding:.1rem .45rem; white-space:nowrap; }
+.sc-sb-file { display:flex; align-items:baseline; gap:.15rem; padding:.05rem .45rem;
+  color:var(--sc-fg); white-space:nowrap; }
+.sc-sb-file:hover { background:#1e293b; }
+.sc-sb-sel { background:var(--sc-ok); color:#0b1020; font-weight:700; }
+.sc-sb-sel .sc-sb-glyph, .sc-sb-sel .sc-sb-name { color:#0b1020; }
+.sc-sb-mark { display:inline-block; width:.85rem; text-align:center; }
+.sc-sb-glyph { opacity:.6; }
+.sc-sb-more { color:var(--sc-dim); font-style:italic; padding:.2rem .45rem; }
+.sc-sb-empty, .sc-sb-stat { padding:.15rem .45rem; color:var(--sc-fg); }
+.sc-sb-dim { color:var(--sc-dim); }
+.sc-main-col { flex:1 !important; min-width:0 !important; min-height:0 !important;
+  height:100% !important; display:flex !important; flex-direction:column !important;
+  gap:.5rem !important; overflow:hidden !important; }
+.sc-editor-wrap, .sc-editor-wrap .gr-group { overflow:visible !important; flex-shrink:0 !important; }
+.sc-transcript-wrap { flex:1; min-height:0; overflow-y:auto; overflow-x:hidden;
+  background:#0f172a; border:1px solid var(--sc-border); border-radius:8px; padding:.5rem .65rem; }
+.sc-transcript-inner { font-family:ui-monospace,monospace; font-size:.82rem; line-height:1.45; }
+.sc-transcript-empty { color:var(--sc-dim); padding:1rem; font-family:ui-monospace,monospace; }
+.sc-tline { margin:.15rem 0; }
+.sc-tglyph { display:inline-block; width:1rem; }
+.sc-editor-wrap { border:1px solid var(--sc-accent); border-radius:8px; padding:.25rem;
+  background:#0f172a; flex-shrink:0; min-height:9rem; overflow:visible !important; }
+.sc-editor-wrap .block, #sc-editor { height:auto !important; min-height:7rem !important;
+  overflow:visible !important; }
+.sc-editor-wrap label { display:flex !important; flex-direction:column; min-height:6.5rem; }
+.sc-editor-wrap textarea, #sc-editor textarea, #sc-editor input,
+[data-testid="textbox"] textarea, [data-testid="textbox"] input {
+  font-family:ui-monospace,monospace !important; font-size:.85rem !important;
+  background:#0f172a !important; color:var(--sc-fg) !important; border:none !important;
+  box-shadow:none !important; pointer-events:auto !important;
+  min-height:6.5rem !important; resize:vertical !important; }
+#sc-editor { pointer-events:auto !important; }
+.sc-editor-hint { font-size:.72rem; color:var(--sc-dim); padding:.2rem .4rem;
+  font-family:ui-monospace,monospace; }
+.sc-status-wrap { flex-shrink:0; }
+.sc-status-bar { display:flex; flex-wrap:wrap; gap:.35rem; padding:.4rem .5rem;
+  background:#1e293b; border-radius:6px; font-family:ui-monospace,monospace; font-size:.75rem; }
+.sc-chip { padding:2px 8px; border-radius:4px; background:#334155; color:#e2e8f0; }
+.sc-chip-brand { background:var(--sc-accent); color:#fff; font-weight:700; }
+.sc-chip-mode { background:#2a2410; color:var(--hf-yellow); font-weight:600; }
+.sc-chip-think { background:#422006; color:#fdba74; }
+.sc-chip-run { background:#14532d; color:#86efac; }
+.sc-chip-dim { color:#94a3b8; }
+.sc-chip-model { color:#a78bfa; }
+.sc-chip-clickable { cursor:pointer; border:none; font:inherit; font-family:inherit; font-size:inherit; }
+.sc-chip-clickable:hover { filter:brightness(1.15); }
+.sc-picker-title { color:var(--sc-accent); font-weight:700; margin-bottom:.5rem; }
+.sc-picker-list { display:flex; flex-direction:column; gap:2px; max-height:280px; overflow-y:auto; }
+.sc-picker-item { display:flex; gap:.35rem; align-items:baseline; width:100%; text-align:left;
+  padding:.25rem .4rem; background:transparent; border:none; color:var(--sc-fg);
+  font-family:ui-monospace,monospace; font-size:.85rem; cursor:pointer; border-radius:4px; }
+.sc-picker-item:hover { background:#334155; }
+.sc-picker-sel { background:var(--sc-accent); color:#fff; font-weight:700; }
+.sc-picker-mark { display:inline-block; width:1rem; text-align:center; }
+.sc-picker-hint { margin-top:.6rem; font-size:.72rem; color:var(--sc-dim); }
+.sc-picker-empty { color:var(--sc-dim); font-style:italic; }
+.sc-popup-item.sc-popup-sel { background:#334155; font-weight:700; }
+.sc-overlay { position:fixed; inset:0; background:rgba(0,0,0,.55); z-index:9999;
+  display:flex; align-items:center; justify-content:center; pointer-events:auto; }
+.sc-overlay-panel { background:#1e293b; border:1px solid #7c3aed; border-radius:10px;
+  padding:1rem 1.25rem; max-width:480px; font-family:ui-monospace,monospace; font-size:.85rem;
+  color:#e2e8f0; pointer-events:auto; }
+.sc-popup { position:absolute; z-index:100; background:#1e293b; border:1px solid #7c3aed;
+  border-radius:6px; max-height:200px; overflow:auto; font-family:ui-monospace,monospace; font-size:.8rem; }
+.sc-popup-item { padding:.25rem .5rem; cursor:pointer; color:#34d399; }
+.sc-popup-item:hover { background:#334155; }
+.sc-approval { padding:.75rem 1rem; border:1px solid rgba(124,58,237,.45);
+  border-radius:8px; background:#1e1b4b; margin:.5rem 0; font-size:.9rem; }
+footer { display:none !important; }
+.gradio-container .block, .gradio-container .form { background:transparent !important;
+  border:none !important; box-shadow:none !important; }
+.gradio-container .gr-group { background:transparent !important; border:none !important; }
+.gradio-container label { display:none !important; }
+.sc-hidden-controls { position:fixed !important; left:-10000px !important; top:0 !important;
+  width:1px !important; height:1px !important; overflow:hidden !important; opacity:0 !important; }
+.sc-hidden-btn, .sc-hidden-btn.block, #sc-submit, #sc-clear, #sc-interrupt, #sc-toggle-sidebar,
+#sc-cycle-mode, #sc-cycle-agent, #sc-cycle-model, #sc-cycle-think, #sc-help, #sc-whichkey,
+#sc-open-picker-models, #sc-open-picker-themes, #sc-open-picker-agents, #sc-open-picker-sessions,
+#sc-picker-up, #sc-picker-down, #sc-picker-confirm, #sc-picker-pick {
+  position:fixed !important; left:-10000px !important; top:0 !important;
+  width:1px !important; height:1px !important; opacity:0 !important;
+  overflow:hidden !important; pointer-events:auto !important; }
+""" + theme_css_vars()
+def smolcode_header_html(*, preset: str, tier_badge: str, subtitle: str) -> str:
+    return (
+        f"<div class='sc-header'>{HF_LOGO_SVG}<div>"
+        f"<div class='sc-title'>smol<span class='hf-accent'>code</span>"
+        f"<span class='sc-badge'>preset: {preset}</span>"
+        f"<span class='sc-badge'>{tier_badge}</span></div>"
+        f"<div class='sc-sub'>{subtitle}</div>"
+        f"</div></div>"
+    )

engine/browser_runner.py ADDED Viewed

	@@ -0,0 +1,145 @@

+"""Subprocess runner: check a model-built web app in a REAL headless browser.
+Invoked as `python engine/browser_runner.py <app.html>` by
+engine/browsercheck.py — never imported (keeps it free of the engine package /
+liteforge, and isolates a browser crash from the Gradio process). It loads the
+app wrapped in the EXACT same `srcdoc` + `sandbox` as the live preview
+(engine/preview.py), so the verdict matches what the user sees, then clicks every
+button and exercises the keyboard, and reports any uncaught JavaScript errors.
+Browser: headless Firefox via Selenium + geckodriver. (Playwright's browser CDN
+is firewalled in this environment; conda-forge Firefox is the reachable, rootless
+real browser. The choice is invisible to callers — same JSON contract.)
+We capture errors by injecting a tiny `window.onerror`/`unhandledrejection`
+collector at the top of the framed document (so it catches errors during initial
+script execution — the "script ran before its element / undefined function"
+class), then read it back. That is the HARD failure signal.
+Output: one JSON line {ok, errors, buttons, clicked}. Exit 3 only when the
+browser itself can't run, so the caller can fall back to the jsdom checker.
+"""
+import json
+import os
+import re
+import sys
+import tempfile
+PREVIEW_SANDBOX = "allow-scripts allow-same-origin allow-modals allow-popups allow-forms"
+# Installed by the rootless conda-forge setup (see DEVELOPING.md). Overridable.
+_BROWSER_PREFIX = os.environ.get(
+    "SMOLBUILDER_BROWSER_PREFIX",
+    os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), ".browser"))
+_FIREFOX_BIN = os.path.join(_BROWSER_PREFIX, "bin", "FirefoxApp", "firefox")
+_GECKODRIVER = os.path.join(_BROWSER_PREFIX, "bin", "geckodriver")
+# Injected first inside the frame so it catches errors thrown during load.
+_CAPTURE = ("<script>(function(){window.__errs=[];"
+            "window.addEventListener('error',function(e){try{__errs.push('uncaught: '+"
+            "((e.error&&e.error.message)||e.message||String(e)))}catch(_){}} ,true);"
+            "window.addEventListener('unhandledrejection',function(e){try{__errs.push("
+            "'rejection: '+((e.reason&&e.reason.message)||e.reason))}catch(_){}});})();</script>")
+def _escape_srcdoc(doc: str) -> str:
+    return doc.replace("&", "&amp;").replace('"', "&quot;")
+def _inject_capture(app_html: str) -> str:
+    """Put the error collector before the app's own scripts."""
+    m = re.search(r"<head[^>]*>", app_html, re.I)
+    if m:
+        return app_html[:m.end()] + _CAPTURE + app_html[m.end():]
+    m = re.search(r"<html[^>]*>", app_html, re.I)
+    if m:
+        return app_html[:m.end()] + _CAPTURE + app_html[m.end():]
+    return _CAPTURE + app_html
+def _emit(obj: dict) -> None:
+    sys.stdout.write(json.dumps(obj) + "\n")
+def main(path: str) -> int:
+    try:
+        from selenium import webdriver
+        from selenium.webdriver.firefox.options import Options
+        from selenium.webdriver.firefox.service import Service
+        from selenium.webdriver.common.by import By
+    except Exception as e:
+        _emit({"ok": None, "infra": f"selenium import failed: {e}"})
+        return 3
+    if not (os.path.exists(_FIREFOX_BIN) and os.path.exists(_GECKODRIVER)):
+        _emit({"ok": None, "infra": "firefox/geckodriver not installed"})
+        return 3
+    with open(path, encoding="utf-8") as f:
+        app_html = f.read()
+    host = ('<!doctype html><meta charset="utf-8"><body style="margin:0">'
+            f'<iframe id="app" style="width:100%;height:600px;border:0" '
+            f'sandbox="{PREVIEW_SANDBOX}" '
+            f'srcdoc="{_escape_srcdoc(_inject_capture(app_html))}"></iframe>')
+    host_path = os.path.join(tempfile.mkdtemp(prefix="brhost-"), "host.html")
+    with open(host_path, "w", encoding="utf-8") as f:
+        f.write(host)
+    opts = Options()
+    opts.add_argument("-headless")
+    opts.binary_location = _FIREFOX_BIN
+    opts.set_preference("security.sandbox.content.level", 0)  # no userns in container
+    svc = Service(executable_path=_GECKODRIVER, log_output=os.path.join(tempfile.gettempdir(), "gecko.log"))
+    try:
+        driver = webdriver.Firefox(options=opts, service=svc)
+    except Exception as e:
+        _emit({"ok": None, "infra": f"firefox launch failed: {str(e)[:200]}"})
+        return 3
+    errors: list[str] = []
+    buttons = clicked = 0
+    try:
+        driver.set_page_load_timeout(20)
+        driver.get("file://" + host_path)
+        driver.switch_to.frame(driver.find_element(By.ID, "app"))
+        import time
+        time.sleep(0.3)                                  # let scripts settle
+        els = driver.find_elements(
+            By.CSS_SELECTOR, "button, [onclick], input[type=button], input[type=submit]")
+        buttons = len(els)
+        for el in els[:25]:
+            try:
+                driver.execute_script("arguments[0].disabled=false;", el)
+                el.click()
+                clicked += 1
+            except Exception:
+                pass                                      # handler errors show up in __errs
+        # Exercise keyboard handlers (canvas games etc.).
+        try:
+            driver.execute_script(
+                "['ArrowUp','ArrowDown','ArrowLeft','ArrowRight',' '].forEach(function(k){"
+                "var c={key:k,keyCode:k===' '?32:({ArrowUp:38,ArrowDown:40,ArrowLeft:37,ArrowRight:39}[k]),bubbles:true};"
+                "document.dispatchEvent(new KeyboardEvent('keydown',c));"
+                "window.dispatchEvent(new KeyboardEvent('keydown',c));});")
+        except Exception:
+            pass
+        time.sleep(0.3)                                   # surface late/timer errors
+        try:
+            errors = driver.execute_script("return window.__errs || [];") or []
+        except Exception:
+            errors = []
+    finally:
+        try:
+            driver.quit()
+        except Exception:
+            pass
+    errors = [str(e)[:400] for e in errors][:20]
+    _emit({"ok": len(errors) == 0, "errors": errors, "buttons": buttons, "clicked": clicked})
+    return 0
+if __name__ == "__main__":
+    sys.exit(main(sys.argv[1]))

engine/browsercheck.py ADDED Viewed

	@@ -0,0 +1,111 @@

+"""Real-browser verification of model-built web apps, with a jsdom fallback.
+The web equivalent of `run_python`, but faithful: it drives a REAL headless
+browser (Firefox via Selenium, in engine/browser_runner.py as a subprocess) and
+loads the app in the exact `srcdoc`/`sandbox` wrapper the live preview uses — so
+the agent's verdict matches what the user actually sees. jsdom
+(engine/webcheck.py) can't: it has a working localStorage and never applies the
+sandbox, so it falsely passes apps that break in a browser (e.g. a notepad on a
+`data:` opaque origin).
+Same contract as webcheck.check_html — (True, []) / (False, [...]) / (None, [...]).
+Fallback chain: real browser -> jsdom -> unverifiable. A browser that's missing,
+slow, or crashes returns None internally and falls back rather than failing the
+build (a flaky checker must never cause spurious model escalation).
+The browser must be installed wherever this runs (rootless conda-forge Firefox —
+see DEVELOPING.md); on a minimal image (e.g. the HF Space) it isn't, and we use
+jsdom.
+"""
+from __future__ import annotations
+import functools
+import json
+import os
+import subprocess
+import sys
+import tempfile
+from pathlib import Path
+from . import webcheck
+# Real-browser runners, tried in order. Playwright/Chromium first (the reachable
+# rootless browser in this devcontainer), then conda-forge Firefox/Selenium.
+# Whichever launches first is cached for the life of the process. Both speak the
+# same JSON contract, so the choice is invisible to callers.
+_RUNNERS = [
+    Path(__file__).with_name("playwright_runner.py"),
+    Path(__file__).with_name("browser_runner.py"),
+]
+_BROWSER_PREFIX = Path(os.environ.get(
+    "SMOLBUILDER_BROWSER_PREFIX",
+    str(Path(__file__).resolve().parent.parent / ".browser")))
+def _child_env() -> dict:
+    """Env for the runner subprocess: Firefox's conda libs on LD_LIBRARY_PATH."""
+    env = dict(os.environ)
+    libdir = str(_BROWSER_PREFIX / "lib")
+    prev = env.get("LD_LIBRARY_PATH", "")
+    env["LD_LIBRARY_PATH"] = f"{libdir}:{prev}" if prev else libdir
+    env["SMOLBUILDER_BROWSER_PREFIX"] = str(_BROWSER_PREFIX)
+    return env
+@functools.lru_cache(maxsize=1)
+def _active_runner() -> Path | None:
+    """First runner whose browser actually launches (probed once; cached, since a
+    launch is slow and availability is fixed for the life of the process)."""
+    probe = "<!doctype html><html><body><button>probe</button></body></html>"
+    for runner in _RUNNERS:
+        if not runner.exists():
+            continue
+        ok, _ = _invoke(probe, 45, runner)
+        if ok is not None:
+            return runner
+    return None
+def available() -> bool:
+    """True if any real-browser check actually runs."""
+    return _active_runner() is not None
+def check_html(html: str, timeout: int = 35) -> tuple[bool | None, list[str]]:
+    """Real-browser check with graceful fallback to jsdom, then unverifiable."""
+    runner = _active_runner()
+    if runner is not None:
+        ok, errors = _invoke(html, timeout, runner)
+        if ok is not None:
+            return ok, errors
+    if webcheck.available():
+        return webcheck.check_html(html, timeout=min(timeout, 20))
+    return None, ["no runtime checker available (browser + jsdom both missing)"]
+def _invoke(html: str, timeout: int, runner: Path) -> tuple[bool | None, list[str]]:
+    """Run a browser runner once. Returns (ok|None, errors); None = couldn't run."""
+    with tempfile.NamedTemporaryFile("w", suffix=".html", delete=False) as f:
+        f.write(html)
+        path = f.name
+    try:
+        proc = subprocess.run(
+            [sys.executable, str(runner), path],
+            capture_output=True, text=True, timeout=timeout, env=_child_env())
+    except subprocess.TimeoutExpired:
+        return None, []
+    finally:
+        Path(path).unlink(missing_ok=True)
+    if proc.returncode == 3:
+        return None, []
+    lines = (proc.stdout or "").strip().splitlines()
+    if not lines:
+        return None, []
+    try:
+        data = json.loads(lines[-1])
+    except json.JSONDecodeError:
+        return None, []
+    if data.get("ok") is None:
+        return None, []
+    return bool(data.get("ok")), list(data.get("errors", []))

engine/builder.py ADDED Viewed

	@@ -0,0 +1,270 @@

+"""smolbuilder — a Lovable/Replit-style web-app builder on a tiny local model.
+Where `Router` (engine/router.py) answers one coding *task* per call with a
+fresh workspace, `WebBuilder` is a **stateful session**: you describe a web app,
+the agent builds a self-contained `index.html`, and then you keep talking to it
+("make it dark mode", "add a reset button") and it edits the *same* workspace.
+First build uses the router's escalation idea — start small, and if the tiny
+model can't produce a usable app, retry on the next-bigger model — but once a
+tier succeeds we **lock onto that agent and its workspace** so every later turn
+is a cheap incremental edit rather than a from-scratch rebuild.
+The build is verified by rendering: did the agent leave a non-trivial HTML
+entrypoint behind? Static apps have no `run_python` signal, so "it produced an
+app you can preview" is the success criterion the UI also relies on.
+"""
+from __future__ import annotations
+from collections.abc import AsyncIterator
+from dataclasses import dataclass, field
+from .agent import SmallCodeAgent, Step
+from .config import Preset, Tier, load_preset
+from .live_run import LiveFrame
+from .preview import find_entry, inline_app, preview_iframe
+from .router import classify_tier
+from .sandbox import Workspace
+from .tools import build_web_registry
+from .trace_collector import TraceEvent
+from .ui_trace import merge_step_metadata
+from . import browsercheck
+BUILD_SYSTEM_PROMPT = """You are smolbuilder, a web app builder running on a small local model.
+You build small, self-contained web apps that run directly in a browser — like a tiny Lovable or Replit.
+Your workspace tools:
+- write_file(path, content): create or overwrite a file.
+- read_file(path): read a file back.
+- list_files(): see what already exists.
+- check_app(): run the current app in a headless browser — load index.html, execute its JavaScript, click every button — and report any errors.
+Hard rules:
+1. The app's entrypoint is ALWAYS a single file named index.html, and it must start with <!doctype html><html> and include <head> and <body>.
+2. Put the CSS in a <style> tag and the JavaScript in a <script> tag INSIDE index.html. Prefer one self-contained file — it must run with no build step and no server.
+3. Put the <script> tag at the very END of <body>, AFTER the elements it uses (or wrap your code in window.addEventListener('DOMContentLoaded', ...)). If a script runs before its elements exist, document.getElementById returns null and every button silently breaks.
+4. Every button or interactive control must have a working handler that you actually wire up. Define functions before they are referenced.
+5. Vanilla HTML/CSS/JS only. Do not require a framework, npm, or a backend. You may load a library from a CDN with a full https:// URL only if it is truly needed.
+6. Make it look good by default: sensible layout, spacing, a coherent color palette, readable type. Mobile-friendly.
+Method — follow it every time:
+1. Write a complete index.html in one write_file call.
+2. Call check_app() to test it.
+3. If check_app reports errors, read them, fix index.html (write the FULL file again), and call check_app again. Repeat until it reports ok.
+4. To CHANGE an existing app, write the FULL updated index.html (never a partial file — keep everything that already worked), then check_app again.
+Only finish once check_app reports the app works. Then reply with one short sentence describing what the app does. Do not paste the code in your reply.
+"""
+# Minimum entrypoint size (chars) to count as "a real app" and not a stub.
+_MIN_APP_CHARS = 60
+@dataclass
+class BuildResult:
+    final: str
+    steps: list[Step]
+    files: dict[str, str]
+    preview_html: str
+    entry: str | None
+    tier_name: str
+    tier_model: str
+    start_tier: str
+    escalations: int
+    verified: bool
+    turn: int = 0
+    trace_events: list[TraceEvent] = field(default_factory=list)
+    agent: SmallCodeAgent | None = None
+    @property
+    def app_html(self) -> str:
+        """The self-contained document — for the 'download app' button."""
+        return inline_app(self.files)
+def _evaluate(agent: SmallCodeAgent) -> tuple[bool, str | None, dict[str, str]]:
+    """Did the agent leave a *working* app behind? Drives the verified badge and
+    escalation. Structural first (is there a real HTML entrypoint), then a
+    runtime check — a broken app (JS errors) counts as a failure so the router
+    escalates to a bigger model. An unverifiable check (no Node) doesn't fail.
+    """
+    files = agent.files()
+    entry = find_entry(files)
+    if entry is None or len(files[entry].strip()) < _MIN_APP_CHARS:
+        return False, entry, files
+    if entry.lower().endswith((".html", ".htm")):
+        ok, _errors = browsercheck.check_html(inline_app(files))
+        if ok is False:
+            return False, entry, files
+    return True, entry, files
+class WebBuilder:
+    """A persistent build session. One instance per browser session (gr.State)."""
+    def __init__(self, preset: Preset | None = None, max_steps: int = 16,
+                 preview_height: int = 540) -> None:
+        self.preset = preset or load_preset()
+        self.tiers: list[Tier] = self.preset.tiers
+        self.max_steps = max_steps
+        self.preview_height = preview_height
+        # The workspace (the built app on disk) persists across turns; the tier
+        # that built it is remembered so edits stay on the same model. A spent
+        # LiteForge agent can't be re-run, so each turn gets a fresh agent over
+        # this same workspace.
+        self.workspace: Workspace | None = None
+        self.tier_idx = 0
+        self.turn = 0
+        self.think = "off"
+        self.yolo = False
+    @property
+    def has_app(self) -> bool:
+        """True once a first build has produced a workspace to iterate on."""
+        return self.workspace is not None
+    # --- public API ------------------------------------------------------
+    async def send(self, message: str) -> BuildResult:
+        """Build (first turn) or edit (later turns) and return a BuildResult."""
+        result: BuildResult | None = None
+        async for frame in self.send_live(message):
+            if frame.done and isinstance(frame.result, BuildResult):
+                result = frame.result
+        assert result is not None
+        return result
+    async def send_live(self, message: str) -> AsyncIterator[LiveFrame]:
+        """Yield live frames while building or editing."""
+        self.turn += 1
+        if self.workspace is None:
+            async for frame in self._first_build_live(message):
+                yield frame
+        else:
+            async for frame in self._iterate_live(message):
+                yield frame
+    def reset(self) -> None:
+        """Drop the current app and start a fresh session."""
+        self.cleanup()
+        self.workspace = None
+        self.tier_idx = 0
+        self.turn = 0
+    def cleanup(self) -> None:
+        if self.workspace is not None:
+            self.workspace.cleanup()
+    def empty_preview(self) -> str:
+        return preview_iframe({}, height=self.preview_height)
+    # --- internals -------------------------------------------------------
+    def _new_agent(self, tier: Tier, workspace: Workspace | None = None) -> SmallCodeAgent:
+        return SmallCodeAgent(
+            preset=self.preset, model=tier.model, max_steps=self.max_steps,
+            system_prompt=BUILD_SYSTEM_PROMPT, registry_builder=build_web_registry,
+            workspace=workspace, name="smolbuilder",
+            agent="build", profile="web",
+        )
+    async def _first_build_live(self, message: str) -> AsyncIterator[LiveFrame]:
+        """Escalate the model ladder until one produces a previewable app."""
+        start = classify_tier(message, len(self.tiers))
+        task = (f"Build this web app as a self-contained index.html:\n\n{message}")
+        escalations = 0
+        last: BuildResult | None = None
+        prev_tier_name: str | None = None
+        for idx in range(start, len(self.tiers)):
+            tier = self.tiers[idx]
+            if prev_tier_name is not None:
+                yield LiveFrame(events=[
+                    TraceEvent(kind="tier_escalation", name=tier.name,
+                               detail=f"escalated from {prev_tier_name}"),
+                ])
+            agent = self._new_agent(tier)
+            async for frame in agent.run_live_turn(
+                task, think=self.think, yolo=self.yolo,
+            ):
+                if not frame.done:
+                    yield frame
+                    continue
+                final, steps = frame.result
+                ok, entry, files = _evaluate(agent)
+                ok = ok and not (agent.hit_max_steps or agent.errored)
+                last = self._result(agent, final, steps, files, entry, tier,
+                                    self.tiers[start].name, escalations, ok)
+                is_last_tier = idx == len(self.tiers) - 1
+                if ok or is_last_tier:
+                    self.workspace = agent.workspace
+                    self.tier_idx = idx
+                    yield LiveFrame(
+                        steps=steps,
+                        events=last.trace_events,
+                        files=last.files,
+                        done=True,
+                        result=last,
+                    )
+                    return
+                if idx < len(self.tiers) - 1:
+                    agent.trace_collector.record_escalation(tier.name, self.tiers[idx + 1].name)
+                agent.cleanup()
+                escalations += 1
+                prev_tier_name = tier.name
+        if last is not None:
+            yield LiveFrame(
+                steps=last.steps,
+                events=last.trace_events,
+                files=last.files,
+                done=True,
+                result=last,
+            )
+    async def _iterate_live(self, message: str) -> AsyncIterator[LiveFrame]:
+        tier = self.tiers[self.tier_idx]
+        agent = self._new_agent(tier, self.workspace)
+        cur = self.workspace.read_file("index.html")
+        body = cur["content"] if cur.get("ok") else ""
+        task = (
+            "You are editing an existing web app. Here is the current "
+            "index.html:\n\n```html\n" + body + "\n```\n\n"
+            "Apply the change below, then save the COMPLETE updated file with a "
+            "single write_file(\"index.html\", <full new contents>). Keep "
+            "everything that already works and output the whole file, never a "
+            "fragment.\n\nChange to make: " + message
+        )
+        async for frame in agent.run_live_turn(
+            task, think=self.think, yolo=self.yolo,
+        ):
+            if not frame.done:
+                yield frame
+                continue
+            final, steps = frame.result
+            ok, entry, files = _evaluate(agent)
+            ok = ok and not (agent.hit_max_steps or agent.errored)
+            result = self._result(agent, final, steps, files, entry, tier, tier.name, 0, ok)
+            yield LiveFrame(
+                steps=steps,
+                events=result.trace_events,
+                files=result.files,
+                done=True,
+                result=result,
+            )
+    def _result(self, agent: SmallCodeAgent, final, steps, files, entry, tier, start_name,
+                escalations, verified) -> BuildResult:
+        # Small models sometimes write the file but return an empty answer; give
+        # the chat something sensible rather than a blank bubble.
+        if not (final or "").strip():
+            final = "✅ Done: check the live preview." if verified else \
+                "I made an attempt; have a look and tell me what to fix."
+        events = merge_step_metadata(agent.trace_collector.snapshot(), agent.raw_history())
+        return BuildResult(
+            final=final, steps=steps, files=files,
+            preview_html=preview_iframe(files, height=self.preview_height),
+            entry=entry, tier_name=tier.name, tier_model=tier.model,
+            start_tier=start_name, escalations=escalations,
+            verified=bool(verified), turn=self.turn,
+            trace_events=events, agent=agent,
+        )

engine/config.py ADDED Viewed

	@@ -0,0 +1,290 @@

+"""Backend presets for smolcode.
+smolcode always talks to ONE OpenAI-compatible endpoint. A "preset" just
+selects the base_url and the model *tiers* the router may escalate through.
+Everything is overridable by environment variables so the same code runs on a
+laptop, inside an HF Space, or against the hal-9000 "home supercomputer".
+Env overrides (highest priority):
+  SMALLCODE_PRESET     space | laptop | hal | hal-smol   (default: hal)
+  SMALLCODE_BASE_URL   OpenAI-compatible /v1 URL
+  SMALLCODE_API_KEY    bearer token (most local servers ignore it)
+  SMALLCODE_MODEL      force a single model (disables tiering)
+"""
+from __future__ import annotations
+import os
+import re
+from dataclasses import dataclass, field
+@dataclass(frozen=True)
+class Tier:
+    """One rung of the model ladder. `name` is what the router shows in the UI."""
+    name: str
+    model: str
+@dataclass(frozen=True)
+class Preset:
+    key: str
+    base_url: str
+    api_key: str
+    # Ordered cheap -> expensive. The router starts at tiers[0] and escalates.
+    tiers: list[Tier] = field(default_factory=list)
+    @property
+    def default_model(self) -> str:
+        return self.tiers[0].model
+@dataclass(frozen=True)
+class SpecialistLadder:
+    """One specialist family's size ladder (cheap -> expensive), reusing Tier."""
+    specialty: str
+    tiers: list[Tier] = field(default_factory=list)
+@dataclass(frozen=True)
+class SpecialistPreset(Preset):
+    """A Preset whose escalation space is 2D: specialty -> size ladder.
+    Subclasses Preset so every existing reader of .base_url/.api_key/.tiers/
+    .default_model (bench, builder, agent) keeps working: the inherited `tiers` is
+    the GENERIC fallback ladder, and `ladders` holds the per-specialty rungs.
+    """
+    ladders: dict[str, SpecialistLadder] = field(default_factory=dict)
+    def ladder_for(self, specialty: str) -> SpecialistLadder:
+        """The specialist ladder for a key, or the generic ladder as a fallback."""
+        lad = self.ladders.get(specialty)
+        if lad and lad.tiers:
+            return lad
+        return SpecialistLadder(specialty="general", tiers=self.tiers)
+# Local Ollama on the workstation exposes an OpenAI-compatible API at :11435/v1.
+# NOTE: the default model is a tool-TUNED 3B (granite4.1:3b), not a coder model.
+# Tiny coder models (qwen2.5-coder:3b) text-emit ```json instead of native
+# `tool_calls`, which LiteForge's agent loop can't execute. Granite-3B (also
+# <=4B, Tiny-Titan-eligible) emits native tool_calls. The dual-mode parser
+# (P1) will let qwen-coder back in for code quality.
+_LAPTOP = Preset(
+    key="laptop",
+    base_url="http://localhost:11435/v1",
+    api_key="ollama",
+    tiers=[Tier("3B", "granite4.1:3b")],
+)
+# The submission Space: a single tiny model served by llama.cpp's llama-server.
+# Kept to one <=4B model so the Tiny Titan claim is unambiguous.
+# Port is configurable: 8080 inside the Space, but on the workstation 8080 is
+# taken by Guacamole/Tomcat so local dev uses SMALLCODE_LLAMA_PORT=8088.
+# llama-server ignores the model name and serves whatever GGUF was loaded.
+_LLAMA_PORT = os.environ.get("SMALLCODE_LLAMA_PORT", "8080")
+_SPACE = Preset(
+    key="space",
+    base_url=f"http://127.0.0.1:{_LLAMA_PORT}/v1",
+    api_key="local",
+    tiers=[Tier("3B", "qwen2.5-coder-3b-instruct-q4_k_m.gguf")],
+)
+# hal-9000 (DGX Spark): full tiered router. Points straight at hal's Ollama
+# (:11434/v1), which serves every pulled model over one OpenAI-compatible
+# endpoint with native tool_calls — simpler than LiteLLM (whose :4000 exposed no
+# models). Tiny tier is a TOOL-TUNED model (granite4.1:3b) that reliably drives
+# the loop; escalate to bigger Qwen *coder* models for hard codegen. (Tiny coder
+# models can't native-tool-call — see engine/config laptop note.)
+_HAL = Preset(
+    key="hal",
+    base_url="http://10.8.0.6:11434/v1",
+    api_key=os.environ.get("SMALLCODE_API_KEY", "ollama"),
+    # All-Granite ladder: every tier emits native tool_calls on Ollama (verified
+    # on hal), all <=32B. NOTE: qwen2.5-coder does NOT native-tool-call on Ollama
+    # at ANY size (3b/14b text-emit the call) — bringing the Qwen *coder* models
+    # in (for the benchmark story) requires the dual-mode parser (see task 6).
+    tiers=[
+        Tier("3B", "granite4.1:3b"),
+        Tier("8B", "granite4.1:8b"),
+        Tier("30B", "granite4.1:30b"),
+    ],
+)
+# hal-9000 with the fine-tuned coder as the entry tier. The finetune/ pipeline
+# trains Qwen2.5-Coder-1.5B to emit native <tool_call> (see finetune/README.md),
+# so once it's served on hal's Ollama it can be the cheap first rung and we only
+# escalate to Granite on verification failure. The served tag is configurable via
+# SMALLCODE_SMOL_MODEL (default matches the published model name); import the GGUF
+# into Ollama under that tag, or point SMALLCODE_BASE_URL at a llama-server.
+_SMOL_MODEL = os.environ.get("SMALLCODE_SMOL_MODEL", "smolcode-coder-1.5b:tools")
+_HAL_SMOL = Preset(
+    key="hal-smol",
+    base_url="http://10.8.0.6:11434/v1",
+    api_key=os.environ.get("SMALLCODE_API_KEY", "ollama"),
+    tiers=[
+        Tier("1.5B-tuned", _SMOL_MODEL),
+        Tier("8B", "granite4.1:8b"),
+        Tier("30B", "granite4.1:30b"),
+    ],
+)
+# --- the 2D specialist matrix (hal-matrix preset) ----------------------------
+# A model per language/function (smolcode-coder-{specialty}-{size}:tools), served
+# on hal's Ollama. The router classifies the task's specialty, picks that family's
+# size ladder, and escalates within it — then into the generic Granite ladder at
+# the top. Tags are derived by CONVENTION + served-tag discovery, so adding a
+# specialist is a serving action, not a code edit.
+_SPECIALIST_SIZES = ("1.5b", "3b", "7b")   # 7b deferred but recognized if served.
+_SPECIALTIES = ("py", "js", "bash", "git", "dotnet", "csharp", "java",
+                "powershell", "rust", "docker", "bsd", "go", "sql", "cpp", "terraform",
+                "orchestrate")   # task_batch / parallel fan-out specialist
+# Pattern is overridable so one env var can repoint the whole matrix. Back-compat:
+# a value WITHOUT a "{specialty}" placeholder is treated as a legacy single tag.
+_SMOL_PATTERN = os.environ.get("SMALLCODE_SMOL_MODEL",
+                               "smolcode-coder-{specialty}-{size}:tools")
+# Size parsing + specialty detection — shared by the model picker (Tiny-Titan <=32B
+# display filter, collapsing the 16-per-size specialty fine-tunes to one "Auto" entry
+# per size). Mirrors smolcode-cli/src/router.rs parse_size_b and the size_b() regex in
+# tests/test_matrix_routing.py.
+_SIZE_RE = re.compile(r"(\d+(?:\.\d+)?)b\b", re.I)
+def parse_size_b(model: str) -> float:
+    """Parameter count in billions from a model tag (last '<n>b' group), else 0.0.
+    'granite4.1:30b' -> 30.0, 'smolcode-coder-py-1.5b:tools' -> 1.5. Unknown -> 0.0
+    (so size-unknown models pass a '<=32B' filter rather than being hidden)."""
+    found = _SIZE_RE.findall(model or "")
+    return float(found[-1]) if found else 0.0
+def is_specialty_model(model: str) -> bool:
+    """True if the tag is a per-specialty fine-tune (smolcode-coder-<specialty>-...)."""
+    m = (model or "").lower()
+    return any(m.startswith(f"smolcode-coder-{s}-") for s in _SPECIALTIES)
+def specialist_sizes(preset: "Preset") -> list[str]:
+    """Distinct specialist sizes (<=32B) present in a matrix preset's ladders,
+    smallest first (e.g. ['1.5b', '3b']). Empty for non-matrix presets."""
+    sizes: dict[float, str] = {}
+    for lad in (getattr(preset, "ladders", {}) or {}).values():
+        for t in lad.tiers:
+            if is_specialty_model(t.model):
+                sb = parse_size_b(t.model)
+                if 0 < sb <= 32:
+                    sizes.setdefault(sb, f"{_SIZE_RE.findall(t.model)[-1]}b")
+    return [sizes[k] for k in sorted(sizes)]
+# Generic Granite ladder every specialist escalates INTO at its top rung (all <=32B).
+_GENERIC_TIERS = [Tier("8B", "granite4.1:8b"), Tier("30B", "granite4.1:30b")]
+# Static fallback set of served tags when /v1/models discovery is unavailable.
+# Keep in sync with what's pulled on hal; discovery (below) supersedes it.
+_HAL_SERVED: set[str] = {f"smolcode-coder-{s}-1.5b:tools" for s in _SPECIALTIES} | \
+                        {f"smolcode-coder-{s}-3b:tools" for s in _SPECIALTIES}
+_DISCOVERY_CACHE: dict[str, set[str]] = {}
+def _discover_served(base_url: str, api_key: str) -> set[str]:
+    """GET the OpenAI-compatible /v1/models once (cached per base_url); the set of
+    served model tags. Any failure -> empty set (caller falls back to _HAL_SERVED)."""
+    if base_url in _DISCOVERY_CACHE:
+        return _DISCOVERY_CACHE[base_url]
+    served: set[str] = set()
+    try:
+        import json
+        import urllib.request
+        req = urllib.request.Request(base_url.rstrip("/") + "/models",
+                                     headers={"Authorization": f"Bearer {api_key}"})
+        with urllib.request.urlopen(req, timeout=2) as r:
+            data = json.loads(r.read())
+        served = {m["id"] for m in data.get("data", []) if "id" in m}
+    except Exception:
+        served = set()
+    _DISCOVERY_CACHE[base_url] = served
+    return served
+def _build_ladder(specialty: str, served: set[str]) -> SpecialistLadder:
+    """One specialist ladder: served specialist sizes (smallest first), then the
+    generic Granite tiers. Missing sizes are skipped; a wholly-missing specialist
+    yields just the generic tiers (ladder_for also guards this)."""
+    tiers: list[Tier] = []
+    if "{specialty}" in _SMOL_PATTERN:
+        for size in _SPECIALIST_SIZES:
+            tag = _SMOL_PATTERN.format(specialty=specialty, size=size)
+            if tag in served:
+                tiers.append(Tier(f"{size}-{specialty}", tag))
+    tiers.extend(_GENERIC_TIERS)
+    return SpecialistLadder(specialty=specialty, tiers=tiers)
+_HAL_MATRIX = SpecialistPreset(
+    key="hal-matrix",
+    base_url="http://10.8.0.6:11434/v1",
+    api_key=os.environ.get("SMALLCODE_API_KEY", "ollama"),
+    tiers=_GENERIC_TIERS,    # generic fallback ladder (inherited Preset.tiers)
+    ladders={},              # built lazily in load_preset (needs the resolved base_url)
+)
+_PRESETS = {p.key: p for p in (_LAPTOP, _SPACE, _HAL, _HAL_SMOL, _HAL_MATRIX)}
+def default_ui_model(preset: Preset, cfg: dict) -> str:
+    """Resolve the default model for the web UI from config and preset tiers."""
+    if cfg.get("model"):
+        return str(cfg["model"])
+    if preset.tiers:
+        return preset.default_model
+    return ""
+def load_preset() -> Preset:
+    """Resolve the active preset, applying env overrides and Rust config.toml."""
+    # Default to the 2D specialist matrix so "Auto" routes by specialty out of the box;
+    # it auto-detects served specialists and falls back to the generic Granite ladder
+    # (per-specialty: ladder_for(); whole matrix: _discover_served -> _HAL_SERVED).
+    key = os.environ.get("SMALLCODE_PRESET", "hal-matrix").lower()
+    base = _PRESETS.get(key, _LAPTOP)
+    rust_cfg: dict = {}
+    try:
+        from .rust_session import load_rust_config
+        rust_cfg = load_rust_config()
+    except Exception:
+        pass
+    base_url = os.environ.get("SMALLCODE_BASE_URL", rust_cfg.get("base_url", base.base_url))
+    api_key = os.environ.get("SMALLCODE_API_KEY", base.api_key)
+    # An explicit env SMALLCODE_MODEL is a hard single-model override and wins over
+    # everything (including the matrix). A `model` in config.toml is only a *default*
+    # — it must NOT silently disable the matrix when the user explicitly asked for it
+    # via SMALLCODE_PRESET=hal-matrix.
+    env_model = os.environ.get("SMALLCODE_MODEL")
+    if env_model:
+        return Preset(key=base.key, base_url=base_url, api_key=api_key,
+                      tiers=[Tier("custom", env_model)])
+    if isinstance(base, SpecialistPreset):
+        served = _discover_served(base_url, api_key) or _HAL_SERVED
+        ladders = {s: _build_ladder(s, served) for s in _SPECIALTIES}
+        return SpecialistPreset(key=base.key, base_url=base_url, api_key=api_key,
+                                tiers=_GENERIC_TIERS, ladders=ladders)
+    # A config.toml `model` is a DEFAULT, not a hard override (that's SMALLCODE_MODEL,
+    # handled above). If it just names this preset's entry tier — the common case, e.g.
+    # the CLI default == hal-smol's 1.5B entry — keep the full escalation LADDER (so the
+    # router + judge still work). Only a model that ISN'T the preset entry is treated as
+    # a deliberate single-model choice.
+    forced = rust_cfg.get("model")
+    if forced and base.tiers and forced != base.default_model:
+        return Preset(key=base.key, base_url=base_url, api_key=api_key,
+                      tiers=[Tier("custom", forced)])
+    return Preset(key=base.key, base_url=base_url, api_key=api_key, tiers=base.tiers)

engine/fanout.py ADDED Viewed

	@@ -0,0 +1,128 @@

+"""Parallel sub-agent fan-out for the Python engine (mirror of the Rust CLI's
+`task_batch`).
+Where the Router runs ONE task through a tier ladder, fan-out runs MANY independent
+tasks at once: each gets its own SmallCodeAgent + fresh Workspace and they run
+concurrently via asyncio.gather, bounded so local inference isn't oversubscribed.
+Use it for independent work — exploring/solving several things in parallel — when
+each subtask doesn't depend on the others' output.
+Cheap when each agent is a small local (e.g. the fine-tuned 1.5B) model: wall-clock
+is ~the slowest job, not the sum.
+"""
+from __future__ import annotations
+import asyncio
+from collections.abc import AsyncIterator
+from dataclasses import dataclass, field
+from .agent import SmallCodeAgent, Step
+from .config import Preset, load_preset
+from .live_run import LiveFrame
+from .router import _verify
+from .trace_collector import TraceEvent
+from .ui_trace import merge_step_metadata
+MAX_CONCURRENCY = 4
+@dataclass
+class FanoutResult:
+    index: int
+    task: str
+    final: str
+    steps: list[Step]
+    model: str
+    verified: bool
+    files: dict[str, str] = field(default_factory=dict)
+    error: str | None = None
+    trace_events: list[TraceEvent] = field(default_factory=list)
+    agent: SmallCodeAgent | None = None
+async def fan_out(tasks: list[str], preset: Preset | None = None,
+                  model: str | None = None, max_steps: int = 12,
+                  concurrency: int = MAX_CONCURRENCY) -> list[FanoutResult]:
+    """Run `tasks` concurrently, each in its own agent/workspace.
+    `model` defaults to the preset's entry tier (the cheap small model — the
+    natural choice for fanning out). Results are returned in input order.
+    """
+    results: list[FanoutResult] = []
+    async for frame in fan_out_live(tasks, preset=preset, model=model,
+                                    max_steps=max_steps, concurrency=concurrency):
+        if frame.done and isinstance(frame.result, list):
+            results = frame.result
+    return results
+async def fan_out_live(
+    tasks: list[str],
+    preset: Preset | None = None,
+    model: str | None = None,
+    max_steps: int = 12,
+    concurrency: int = MAX_CONCURRENCY,
+    poll_interval: float = 0.35,
+) -> AsyncIterator[LiveFrame]:
+    """Yield aggregate live frames while fan-out jobs run."""
+    if not tasks:
+        yield LiveFrame(done=True, result=[])
+        return
+    preset = preset or load_preset()
+    model = model or preset.default_model
+    sem = asyncio.Semaphore(max(1, concurrency))
+    agents: list[SmallCodeAgent] = []
+    for i, t in enumerate(tasks):
+        agents.append(SmallCodeAgent(preset=preset, model=model, max_steps=max_steps))
+    async def _job(index: int, task: str, agent: SmallCodeAgent) -> FanoutResult:
+        async with sem:
+            try:
+                final, steps = await agent.run(task)
+                ok = False if (agent.hit_max_steps or agent.errored) else _verify(agent)
+                events = merge_step_metadata(agent.trace_collector.snapshot(), agent.raw_history())
+                return FanoutResult(
+                    index=index, task=task, final=final, steps=steps, model=model,
+                    verified=bool(ok), files=agent.files(), trace_events=events, agent=agent,
+                )
+            except Exception as e:
+                return FanoutResult(index=index, task=task, final="", steps=[],
+                                    model=model, verified=False, error=str(e))
+            finally:
+                agent.cleanup()
+    job_tasks = [
+        asyncio.create_task(_job(i, t, agents[i]))
+        for i, t in enumerate(tasks)
+    ]
+    try:
+        while not all(j.done() for j in job_tasks):
+            # Mid-run we must NOT call current_steps()/history() on a live agent
+            # (the Rust agent isn't reentrant and would deadlock). Read only the
+            # trace collectors (plain lists) and workspace files (disk).
+            events: list[TraceEvent] = []
+            all_files: dict[str, str] = {}
+            for i, agent in enumerate(agents):
+                events.extend(agent.trace_collector.snapshot())
+                for path, content in agent.files().items():
+                    all_files[f"[{i + 1}] {path}"] = content
+            yield LiveFrame(steps=[], events=events, files=all_files)
+            await asyncio.sleep(poll_interval)
+        results = [await j for j in job_tasks]
+        results.sort(key=lambda r: r.index)
+        yield LiveFrame(done=True, result=results)
+    finally:
+        for j in job_tasks:
+            if not j.done():
+                j.cancel()
+def summarize(results: list[FanoutResult]) -> str:
+    """Aggregate fan-out results into one labeled summary (mirrors the Rust output)."""
+    out = [f"Ran {len(results)} subagents in parallel. Results:\n"]
+    for r in results:
+        head = f"=== [{r.index + 1}] {r.model} {'OK' if r.verified else 'unverified'} ==="
+        body = r.error and f"error: {r.error}" or r.final.strip()
+        out.append(f"{head}\n{body}\n")
+    return "\n".join(out).rstrip()

engine/file_tree.py ADDED Viewed

	@@ -0,0 +1,92 @@

+"""Workspace file tree with git status (Rust-backed)."""
+from __future__ import annotations
+import re
+from dataclasses import dataclass
+from .rust_session import (
+    git_status,
+    rust_available,
+    workspace_files,
+    workspace_tree,
+)
+_GIT_LINE = re.compile(r"^([ MADRCU?!]{1,2})\s+(.+)$")
+@dataclass
+class WorkspacePanel:
+    tree_md: str
+    git_md: str
+    file_choices: list[str]
+    preview_md: str
+def parse_git_dirty(git_status_text: str) -> dict[str, str]:
+    """Map repo-relative path to a one-character git status marker."""
+    markers: dict[str, str] = {}
+    for line in git_status_text.splitlines():
+        m = _GIT_LINE.match(line.strip())
+        if not m:
+            continue
+        status, path = m.group(1).strip(), m.group(2).strip()
+        if " -> " in path:
+            path = path.split(" -> ")[-1].strip()
+        mark = status.replace(" ", "")
+        markers[path] = mark[-1] if mark else "?"
+    return markers
+def _preview_md(path: str, content: str) -> str:
+    lang = "python" if path.endswith(".py") else ""
+    return f"**`{path}`**\n```{lang}\n{content}\n```"
+def build_workspace_panel(
+    workspace: str,
+    selected: str | None = None,
+    *,
+    depth: int = 3,
+    files: dict[str, str] | None = None,
+) -> WorkspacePanel:
+    """Build git header, ASCII tree, file picker choices, and file preview."""
+    if not rust_available():
+        return WorkspacePanel(
+            tree_md="_smolcode_core not installed_",
+            git_md="",
+            file_choices=[],
+            preview_md="",
+        )
+    git_text = git_status(workspace)
+    git_lines = git_text.splitlines()
+    git_md = "\n".join(git_lines[:6]) if git_lines else "_not a git repository_"
+    tree_body = workspace_tree(workspace, depth=depth)
+    tree_md = f"```\n{tree_body}\n```"
+    if files is None:
+        files = workspace_files(workspace)
+    dirty = parse_git_dirty(git_text)
+    file_choices: list[str] = []
+    for path in sorted(files):
+        mark = dirty.get(path, "")
+        label = f"{mark} {path}" if mark else path
+        file_choices.append(label)
+    preview_md = ""
+    if selected:
+        clean = selected
+        if len(selected) > 2 and selected[1] == " " and selected[0] in "MADRCU?!":
+            clean = selected[2:]
+        content = files.get(clean, "")
+        if content:
+            preview_md = _preview_md(clean, content)
+    return WorkspacePanel(
+        tree_md=tree_md,
+        git_md=git_md,
+        file_choices=file_choices,
+        preview_md=preview_md,
+    )

engine/gradio_shell.py ADDED Viewed

	@@ -0,0 +1,425 @@

+"""Shared Gradio UI helpers for app.py and smolbuilder.py (CLI parity)."""
+from __future__ import annotations
+import asyncio
+import os
+from dataclasses import dataclass, field
+import gradio as gr
+from .rust_session import (
+    RustSession,
+    expand_command,
+    expand_skill,
+    export_transcript,
+    list_background_jobs,
+    list_commands,
+    list_mcp,
+    list_rules,
+    list_skills,
+    render_config,
+    session_timeline,
+    write_agents_md,
+)
+@dataclass
+class UiSettings:
+    workspace: str = "."
+    model: str = ""
+    agent: str = "build"
+    mode: str = "normal"  # normal | auto | plan
+    think: str = "off"
+    yolo: bool = False
+    fan_out: bool = False
+@dataclass
+class ApprovalState:
+    pending_desc: str | None = None
+    result: bool | None = None
+    async def ask(self, desc: str) -> bool:
+        self.pending_desc = desc
+        self.result = None
+        while self.result is None:
+            await asyncio.sleep(0.15)
+        approved = bool(self.result)
+        self.pending_desc = None
+        self.result = None
+        return approved
+    def approve(self, yes: bool = True) -> None:
+        self.result = yes
+    def deny(self) -> None:
+        self.approve(False)
+@dataclass
+class AppSessionState:
+    """Gradio gr.State payload for session + settings."""
+    rust: RustSession | None = None
+    settings: UiSettings = field(default_factory=UiSettings)
+    approval: ApprovalState = field(default_factory=ApprovalState)
+    status_msg: str = ""
+    bg_jobs: str = ""
+@dataclass
+class SlashResult:
+    reply: str = ""
+    queued_task: str | None = None
+    clear_chat: bool = False
+    download_path: str | None = None
+    toggle_sidebar: bool = False
+    toggle_sidebar_view: bool = False
+    open_picker: str | None = None
+    cycle_mode: bool = False
+    cycle_think: bool = False
+    set_think: str | None = None
+    show_help: bool = False
+    show_whichkey: bool = False
+_BUILTIN_SLASH = {
+    "/help", "/new", "/sessions", "/fork", "/rename", "/export", "/stats",
+    "/mcp", "/rules", "/skills", "/skill", "/commit", "/init", "/bg", "/clear",
+    "/delete", "/timeline", "/mode", "/think", "/config", "/search",
+    "/agents", "/models", "/themes", "/files", "/quit",
+}
+_ATTACH_MAX = 8192
+def parse_input(
+    text: str,
+    *,
+    workspace_files: list[str] | None = None,
+    workspace: str | None = None,
+    rust: RustSession | None = None,
+) -> tuple[str, str | None, str | None]:
+    """Parse user input. Returns (task, slash_command_result, shell_output).
+    - `!cmd` runs shell directly
+    - `/cmd args` returns command to dispatch
+    - `@file` inlines file content into task
+    """
+    stripped = (text or "").strip()
+    if not stripped:
+        return "", None, None
+    if stripped.startswith("!"):
+        return "", None, stripped[1:].strip()
+    if stripped.startswith("/"):
+        return "", stripped, None
+    task = stripped
+    if "@" in task and (workspace_files or workspace):
+        from .rust_session import read_workspace_file
+        paths = list(workspace_files or [])
+        import re
+        for match in re.finditer(r"@(\S+)", task):
+            path = match.group(1)
+            if paths and path not in paths:
+                candidates = [p for p in paths if p.endswith(path) or p == path]
+                if len(candidates) == 1:
+                    path = candidates[0]
+                elif path not in paths:
+                    continue
+            ws = workspace or (rust.workspace_path if rust else ".")
+            content = read_workspace_file(ws, path, max_bytes=_ATTACH_MAX, rust=rust)
+            if content is not None:
+                block = f"[attached: {path}]\n```\n{content}\n```"
+                task = task.replace(f"@{match.group(1)}", block, 1)
+    return task, None, None
+def _workspace(session: AppSessionState) -> str:
+    return session.settings.workspace or "."
+def dispatch_slash(cmd_line: str, session: AppSessionState) -> SlashResult:
+    """Handle a slash command; mirrors CLI TUI handle_slash."""
+    parts = cmd_line.strip().split(maxsplit=1)
+    cmd = parts[0].lower()
+    args = parts[1] if len(parts) > 1 else ""
+    ws = _workspace(session)
+    if cmd == "/help":
+        custom = list_commands(ws)
+        extra = ""
+        if custom:
+            extra = "\n\n**Custom commands:** " + ", ".join(f"`/{n}`" for n in custom)
+        return SlashResult(
+            reply=(
+                "**Slash commands:** `/new`, `/sessions`, `/fork`, `/rename <title>`, "
+                "`/stats`, `/export [file]`, `/timeline`, `/delete`, `/mcp`, `/rules`, "
+                "`/skills`, `/skill <name>`, `/commit [msg]`, `/init`, `/bg`, `/clear`, "
+                "`/mode`, `/think`, `/config`, `/search`, `/files`"
+                f"{extra}\n\n"
+                "**Input:** `!cmd` runs shell without LLM; `@file` attaches workspace files."
+            )
+        )
+    if cmd == "/new":
+        session.rust = None
+        return SlashResult(reply="Started a new session.", clear_chat=True)
+    if cmd == "/sessions":
+        rows = RustSession.list_sessions()
+        if not rows:
+            return SlashResult(reply="_No saved sessions._")
+        lines = [f"- **{r['title']}** (`{r['id']}`)" for r in rows[:20]]
+        return SlashResult(reply="**Sessions:**\n" + "\n".join(lines))
+    if cmd == "/fork":
+        if session.rust and (nid := session.rust.fork()):
+            return SlashResult(reply=f"Forked session → `{nid}`")
+        return SlashResult(reply="Nothing to fork yet.")
+    if cmd == "/rename":
+        if session.rust and args and session.rust.rename(args):
+            return SlashResult(reply=f"Renamed session to **{args}**")
+        return SlashResult(reply="Usage: `/rename <title>`")
+    if cmd == "/stats":
+        nfiles = len(session.rust.files()) if session.rust else 0
+        sid = session.rust.session_id if session.rust else "(none)"
+        return SlashResult(
+            reply=(
+                f"session `{sid}` · workspace: `{ws}` · files: {nfiles} · "
+                f"agent: {session.settings.agent}"
+            )
+        )
+    if cmd == "/export":
+        sid = session.rust.session_id if session.rust else ""
+        if not sid:
+            return SlashResult(reply="No session to export yet.")
+        try:
+            path = export_transcript(sid, args or None)
+            return SlashResult(
+                reply=f"Exported transcript to `{path}`",
+                download_path=path,
+            )
+        except Exception as e:
+            return SlashResult(reply=f"/export failed: {e}")
+    if cmd == "/mcp":
+        if session.rust is None:
+            return SlashResult(
+                reply="_Start a task first so MCP servers are connected._"
+            )
+        servers = list_mcp(session.rust)
+        if not servers:
+            return SlashResult(
+                reply=(
+                    "no MCP servers connected — add `[[mcp]]` entries to "
+                    "`~/.config/smolcode/config.toml` or `.smolcode/config.toml`"
+                )
+            )
+        lines = [f"**MCP servers ({len(servers)}):**"]
+        for row in servers:
+            tools = row.get("tools", [])
+            tlist = ", ".join(tools[:8]) if tools else "(no tools)"
+            if len(tools) > 8:
+                tlist += "…"
+            lines.append(f"- **{row.get('server', '?')}** ({len(tools)}): {tlist}")
+        return SlashResult(reply="\n".join(lines))
+    if cmd == "/rules":
+        rules = list_rules(ws)
+        if not rules:
+            return SlashResult(
+                reply="no rules — add `*.md` to `.smolcode/rules/` or `~/.config/smolcode/rules/`"
+            )
+        lines = [f"**active rules ({len(rules)}):**"]
+        for r in rules:
+            desc = r.get("description", "")
+            tail = f" — {desc}" if desc else ""
+            lines.append(f"- `{r.get('name', '?')}` [{r.get('scope', '?')}]{tail}")
+        return SlashResult(reply="\n".join(lines))
+    if cmd == "/skills":
+        skills = list_skills(ws)
+        if not skills:
+            return SlashResult(
+                reply="no skills — add `<name>/SKILL.md` to `.smolcode/skills/`"
+            )
+        lines = [f"**skills ({len(skills)})** — run with `/skill <name>`:"]
+        for s in skills:
+            desc = s.get("description", "")
+            tail = f" — {desc}" if desc else ""
+            lines.append(f"- `{s.get('name', '?')}`{tail}")
+        return SlashResult(reply="\n".join(lines))
+    if cmd == "/skill":
+        if not args:
+            return SlashResult(reply="Usage: `/skill <name> [args]` (see `/skills`)")
+        sname, _, sargs = args.partition(" ")
+        sname = sname.strip()
+        sargs = sargs.strip()
+        expanded = expand_skill(ws, sname, sargs)
+        if expanded is None:
+            return SlashResult(reply=f"no skill named `{sname}` (see `/skills`)")
+        return SlashResult(reply=f"Running skill **{sname}**…", queued_task=expanded)
+    if cmd == "/commit":
+        if args:
+            task = f"Commit all current changes with git_commit using this message: {args}"
+        else:
+            task = (
+                "Review the staged/unstaged changes with git_diff, then commit them "
+                "with git_commit using a concise, descriptive message."
+            )
+        return SlashResult(reply="Queued git commit task…", queued_task=task)
+    if cmd == "/init":
+        try:
+            path = write_agents_md(ws)
+            return SlashResult(reply=f"wrote `{path}` (project guide for agents)")
+        except Exception as e:
+            return SlashResult(reply=f"/init: {e}")
+    if cmd == "/bg":
+        session.bg_jobs = list_background_jobs()
+        return SlashResult(reply=session.bg_jobs or "_No background jobs._")
+    if cmd == "/timeline":
+        sid = session.rust.session_id if session.rust else ""
+        if not sid:
+            return SlashResult(reply="no saved session yet")
+        lines = session_timeline(sid)
+        return SlashResult(reply="**Timeline:**\n" + "\n".join(f"- {ln}" for ln in lines))
+    if cmd == "/delete":
+        removed = session.rust.delete() if session.rust else False
+        session.rust = None
+        msg = "deleted session; started a new one" if removed else "started a new session"
+        return SlashResult(reply=msg, clear_chat=True)
+    if cmd == "/clear":
+        return SlashResult(reply="_Transcript cleared._", clear_chat=True)
+    if cmd == "/mode":
+        return SlashResult(reply="Cycling mode…", cycle_mode=True)
+    if cmd == "/think":
+        if args:
+            return SlashResult(reply=f"think → {args}", set_think=args.split()[0].lower())
+        return SlashResult(reply="Cycling think level…", cycle_think=True)
+    if cmd == "/config":
+        if session.rust is None:
+            return SlashResult(reply="_Start a task first to view config._")
+        return SlashResult(reply=f"```\n{render_config(session.rust)}\n```")
+    if cmd == "/search":
+        if not args:
+            return SlashResult(reply="Usage: `/search <text>`")
+        return SlashResult(reply=f"_Search for `{args}` runs in transcript handler._")
+    if cmd == "/agents":
+        return SlashResult(reply="Opening agent picker…", open_picker="agents")
+    if cmd == "/models":
+        return SlashResult(reply="Opening model picker…", open_picker="models")
+    if cmd == "/themes":
+        return SlashResult(reply="Opening theme picker…", open_picker="themes")
+    if cmd == "/files":
+        return SlashResult(reply="Toggling sidebar…", toggle_sidebar=True)
+    if cmd == "/quit":
+        return SlashResult(reply="_Use browser close to exit the web UI._")
+    if cmd not in _BUILTIN_SLASH:
+        name = cmd.lstrip("/")
+        expanded = expand_command(ws, name, args)
+        if expanded is not None:
+            return SlashResult(
+                reply=f"Running custom command `/{name}`…",
+                queued_task=expanded,
+            )
+    return SlashResult(reply=f"Unknown command `{cmd}`. Try `/help`.")
+def settings_from_ui(
+    workspace: str,
+    model: str,
+    agent: str,
+    mode: str,
+    think: str,
+    yolo: bool,
+) -> UiSettings:
+    y = yolo or mode == "auto"
+    ag = "plan" if mode == "plan" else agent
+    return UiSettings(
+        workspace=workspace or ".",
+        model=model or "",
+        agent=ag,
+        mode=mode,
+        think=think,
+        yolo=y,
+    )
+def build_settings_panel(preset_models: list[str]) -> dict:
+    """Return Gradio components for the settings sidebar."""
+    with gr.Accordion("⚙️ settings", open=False):
+        workspace = gr.Textbox(
+            value=os.environ.get("SMALLCODE_WORKSPACE", "."),
+            label="workspace directory",
+        )
+        model = gr.Dropdown(
+            choices=preset_models,
+            value=preset_models[0] if preset_models else "",
+            label="model",
+            allow_custom_value=True,
+        )
+        agent = gr.Dropdown(
+            choices=["build", "plan"],
+            value="build",
+            label="agent",
+        )
+        mode = gr.Radio(
+            choices=["normal", "auto", "plan"],
+            value="normal",
+            label="mode",
+        )
+        think = gr.Dropdown(
+            choices=["off", "low", "high", "xtra"],
+            value="off",
+            label="think level",
+        )
+        yolo = gr.Checkbox(value=False, label="yolo (auto-approve tools)")
+    return {
+        "workspace": workspace,
+        "model": model,
+        "agent": agent,
+        "mode": mode,
+        "think": think,
+        "yolo": yolo,
+    }
+def file_tree_md(files: dict[str, str], selected: str | None = None) -> str:
+    """Legacy flat file list (prefer engine.file_tree.build_workspace_panel)."""
+    if not files:
+        return "_workspace is empty_"
+    lines = []
+    for path in sorted(files):
+        mark = " →" if path == selected else ""
+        lines.append(f"- `{path}`{mark}")
+    body = files.get(selected or "", "") if selected and selected in files else ""
+    if body:
+        lang = "python" if selected.endswith(".py") else ""
+        return "\n".join(lines) + f"\n\n**`{selected}`**\n```{lang}\n{body}\n```"
+    return "\n".join(lines)

engine/judge.py ADDED Viewed

	@@ -0,0 +1,90 @@

+"""LLM-judge correctness gate for the router.
+`router._verify()` only proves the produced code RUNS (clean exit / tests it wrote
+itself), not that it's actually CORRECT — so a small model can ship a clean-but-wrong
+solution and the router accepts it instead of escalating (exactly how the bench's
+roman_to_int slipped through: ran fine, wrong output).
+This judge asks a more capable model whether the solution truly satisfies the task; a
+concrete "no" is turned into an escalation by the router. Mirrors
+smolcode-cli/src/judge.rs (JSON-only reply, temperature 0, lenient parse), but the
+verdict drives ESCALATION rather than stop/continue.
+Conservative by design: only a clear defect escalates. On judge error / timeout /
+unparseable reply we ACCEPT — the judge is a net to catch obvious wrongness, not a
+hard gate, and we don't want to over-escalate (and lose the small-model win).
+"""
+from __future__ import annotations
+import json
+import os
+import re
+import liteforge as lf
+_SYSTEM = (
+    "You are a strict senior code reviewer. You are given a coding TASK and the FILES "
+    "an agent produced. The code already runs without crashing — your job is to judge "
+    "whether it is actually CORRECT and COMPLETE for the task: check the exact "
+    "requirements, edge cases, and obvious logic bugs.\n"
+    "Reply with ONLY a JSON object: {\"correct\": true|false, \"reason\": \"<one short sentence>\"}.\n"
+    "Set \"correct\": false if you find ANY bug, wrong/missing edge case, or unmet "
+    "requirement. Ignore style. Do not write code."
+)
+def judge_enabled() -> bool:
+    """Judge is on by default; SMALLCODE_JUDGE=0 disables it."""
+    return os.environ.get("SMALLCODE_JUDGE", "1").lower() not in ("0", "false", "no", "")
+def _files_block(files: dict[str, str], cap: int = 6000) -> str:
+    blob = "\n\n".join(f"### {path}\n{content}" for path, content in files.items())
+    return blob[:cap]
+def _parse(text: str) -> bool | None:
+    """True (correct), False (defect found), or None (couldn't tell)."""
+    m = re.search(r"\{.*\}", text, re.DOTALL)
+    if m:
+        try:
+            obj = json.loads(m.group(0))
+            if isinstance(obj.get("correct"), bool):
+                return obj["correct"]
+        except Exception:
+            pass
+    low = text.lower()
+    if "correct\": false" in low or "correct: false" in low or "incorrect" in low:
+        return False
+    if "correct\": true" in low or "correct: true" in low:
+        return True
+    return None
+async def judge_correct(preset, judge_model: str, task: str,
+                        files: dict[str, str], final: str) -> bool:
+    """Return True if the solution likely satisfies the task, False on a clear defect.
+    Accepts (True) on empty files, judge error, or unparseable reply.
+    """
+    if not files:
+        return True
+    user = (
+        f"TASK:\n{task}\n\nFILES:\n{_files_block(files)}\n\n"
+        f"AGENT'S FINAL CLAIM:\n{(final or '')[:500]}\n\n"
+        "Is the solution correct and complete for the task? Reply with JSON only."
+    )
+    try:
+        client = lf.AsyncForgeClient(
+            base_url=preset.base_url, api_key=preset.api_key, default_model=judge_model,
+        )
+        resp = await client.complete(
+            messages=[{"role": "system", "content": _SYSTEM},
+                      {"role": "user", "content": user}],
+            model=judge_model, temperature=0.0,
+        )
+        content = resp["choices"][0]["message"].get("content", "") or ""
+    except Exception:
+        return True  # judge unavailable -> don't block the accept
+    verdict = _parse(content)
+    return True if verdict is None else verdict

engine/live_run.py ADDED Viewed

	@@ -0,0 +1,93 @@

+"""Live polling helper for Gradio streaming updates."""
+from __future__ import annotations
+import asyncio
+from collections.abc import AsyncIterator, Awaitable, Callable
+from dataclasses import dataclass, field
+from typing import Any, TypeVar
+from .agent import SmallCodeAgent, Step
+from .trace_collector import TraceEvent
+T = TypeVar("T")
+@dataclass
+class LiveFrame:
+    steps: list[Step] = field(default_factory=list)
+    events: list[TraceEvent] = field(default_factory=list)
+    files: dict[str, str] = field(default_factory=dict)
+    done: bool = False
+    result: Any = None
+    raw_event: dict | None = None
+async def run_with_live_updates(
+    coro: Awaitable[T],
+    agent: SmallCodeAgent,
+    *,
+    poll_interval: float = 0.35,
+) -> AsyncIterator[LiveFrame]:
+    """Yield snapshots while `coro` runs, then a final frame with the result."""
+    task = asyncio.create_task(coro)
+    try:
+        while not task.done():
+            yield _live_snapshot(agent)
+            await asyncio.sleep(poll_interval)
+        result = await task
+        yield _final_snapshot(agent, result=result)
+    except asyncio.CancelledError:
+        task.cancel()
+        raise
+async def stream_live(
+    make_coro: Callable[[], Awaitable[T]],
+    get_agent: Callable[[], SmallCodeAgent | None],
+    *,
+    poll_interval: float = 0.35,
+) -> AsyncIterator[LiveFrame]:
+    """Like run_with_live_updates but agent may appear only after coro starts."""
+    task = asyncio.create_task(make_coro())
+    try:
+        while not task.done():
+            agent = get_agent()
+            yield _live_snapshot(agent) if agent is not None else LiveFrame()
+            await asyncio.sleep(poll_interval)
+        result = await task
+        agent = get_agent()
+        if agent is not None:
+            yield _final_snapshot(agent, result=result)
+        else:
+            yield LiveFrame(done=True, result=result)
+    except asyncio.CancelledError:
+        task.cancel()
+        raise
+def _live_snapshot(agent: SmallCodeAgent) -> LiveFrame:
+    """A mid-run snapshot.
+    IMPORTANT: never touch the LiteForge agent object (history/state) while a run
+    is in flight — the Rust ToolCallingAgent is not reentrant and `run()` holds an
+    internal lock for its whole duration, so `current_steps()` would deadlock. We
+    read only the trace collector (a plain Python list the wrapped tools append to)
+    and the workspace files (plain disk reads).
+    """
+    return LiveFrame(
+        steps=[],
+        events=agent.trace_collector.snapshot(),
+        files=agent.files(),
+        done=False,
+    )
+def _final_snapshot(agent: SmallCodeAgent, *, result: Any = None) -> LiveFrame:
+    """A post-run snapshot — safe to read the agent now that `run()` has returned."""
+    return LiveFrame(
+        steps=agent.current_steps(),
+        events=agent.trace_collector.snapshot(),
+        files=agent.files(),
+        done=True,
+        result=result,
+    )

engine/playwright_runner.py ADDED Viewed

	@@ -0,0 +1,132 @@

+"""Subprocess runner: check a model-built web app in headless Chromium.
+A Playwright/Chromium sibling of engine/browser_runner.py (Firefox/Selenium),
+with the IDENTICAL JSON contract so engine/browsercheck.py can try whichever
+real browser is installed. Invoked as `python engine/playwright_runner.py
+<app.html>` — never imported (keeps Playwright out of the Gradio process and
+isolates a browser crash).
+It loads the app in the EXACT same `srcdoc` + `sandbox` wrapper as the live
+preview (engine/preview.py), injects an error collector before the app's own
+scripts, clicks every button, exercises the keyboard, and reports uncaught JS
+errors — the hard failure signal that lets the router escalate a broken build.
+Output: one JSON line {ok, errors, buttons, clicked}. Exit 3 only when Chromium
+itself can't run (Playwright missing or the browser binary not downloaded), so
+the caller falls back to Firefox, then jsdom.
+"""
+import json
+import os
+import re
+import sys
+import tempfile
+PREVIEW_SANDBOX = "allow-scripts allow-same-origin allow-modals allow-popups allow-forms"
+# Same collector browser_runner.py injects: catches errors thrown during load
+# (the "script ran before its element / undefined function" class).
+_CAPTURE = ("<script>(function(){window.__errs=[];"
+            "window.addEventListener('error',function(e){try{__errs.push('uncaught: '+"
+            "((e.error&&e.error.message)||e.message||String(e)))}catch(_){}} ,true);"
+            "window.addEventListener('unhandledrejection',function(e){try{__errs.push("
+            "'rejection: '+((e.reason&&e.reason.message)||e.reason))}catch(_){}});})();</script>")
+_CLICK_SELECTOR = "button, [onclick], input[type=button], input[type=submit]"
+_KEYBOARD_JS = (
+    "['ArrowUp','ArrowDown','ArrowLeft','ArrowRight',' '].forEach(function(k){"
+    "var c={key:k,keyCode:k===' '?32:({ArrowUp:38,ArrowDown:40,ArrowLeft:37,ArrowRight:39}[k]),bubbles:true};"
+    "document.dispatchEvent(new KeyboardEvent('keydown',c));"
+    "window.dispatchEvent(new KeyboardEvent('keydown',c));});")
+def _escape_srcdoc(doc: str) -> str:
+    return doc.replace("&", "&amp;").replace('"', "&quot;")
+def _inject_capture(app_html: str) -> str:
+    m = re.search(r"<head[^>]*>", app_html, re.I)
+    if m:
+        return app_html[:m.end()] + _CAPTURE + app_html[m.end():]
+    m = re.search(r"<html[^>]*>", app_html, re.I)
+    if m:
+        return app_html[:m.end()] + _CAPTURE + app_html[m.end():]
+    return _CAPTURE + app_html
+def _emit(obj: dict) -> None:
+    sys.stdout.write(json.dumps(obj) + "\n")
+def main(path: str) -> int:
+    try:
+        from playwright.sync_api import sync_playwright
+    except Exception as e:                                       # noqa: BLE001
+        _emit({"ok": None, "infra": f"playwright import failed: {e}"})
+        return 3
+    with open(path, encoding="utf-8") as f:
+        app_html = f.read()
+    host = ('<!doctype html><meta charset="utf-8"><body style="margin:0">'
+            f'<iframe id="app" style="width:100%;height:600px;border:0" '
+            f'sandbox="{PREVIEW_SANDBOX}" '
+            f'srcdoc="{_escape_srcdoc(_inject_capture(app_html))}"></iframe>')
+    host_path = os.path.join(tempfile.mkdtemp(prefix="pwhost-"), "host.html")
+    with open(host_path, "w", encoding="utf-8") as f:
+        f.write(host)
+    errors: list[str] = []
+    buttons = clicked = 0
+    try:
+        with sync_playwright() as p:
+            try:
+                browser = p.chromium.launch(
+                    headless=True,
+                    args=["--allow-file-access-from-files", "--no-sandbox"])
+            except Exception as e:                               # noqa: BLE001
+                _emit({"ok": None, "infra": f"chromium launch failed: {str(e)[:200]}"})
+                return 3
+            try:
+                page = browser.new_page()
+                page.set_default_timeout(4000)
+                page.goto("file://" + host_path, timeout=20000)
+                handle = page.wait_for_selector("#app", timeout=5000)
+                frame = handle.content_frame()
+                if frame is None:
+                    _emit({"ok": None, "infra": "could not enter app iframe"})
+                    return 3
+                page.wait_for_timeout(300)                       # let scripts settle
+                els = frame.query_selector_all(_CLICK_SELECTOR)
+                buttons = len(els)
+                for el in els[:25]:
+                    try:
+                        el.evaluate("e => { e.disabled = false; }")
+                        el.click(force=True, timeout=1000)
+                        clicked += 1
+                    except Exception:
+                        pass                                      # handler errors land in __errs
+                try:
+                    frame.evaluate(_KEYBOARD_JS)
+                except Exception:
+                    pass
+                page.wait_for_timeout(300)                       # surface late/timer errors
+                try:
+                    errors = frame.evaluate("() => window.__errs || []") or []
+                except Exception:
+                    errors = []
+            finally:
+                try:
+                    browser.close()
+                except Exception:
+                    pass
+    except Exception as e:                                       # noqa: BLE001
+        _emit({"ok": None, "infra": f"playwright run failed: {str(e)[:200]}"})
+        return 3
+    errors = [str(e)[:400] for e in errors][:20]
+    _emit({"ok": len(errors) == 0, "errors": errors, "buttons": buttons, "clicked": clicked})
+    return 0
+if __name__ == "__main__":
+    sys.exit(main(sys.argv[1]))

engine/preflight.py ADDED Viewed

	@@ -0,0 +1,116 @@

+"""Startup reachability check for the active backend.
+The whole point of smolcode is that one OpenAI-compatible endpoint (chosen by
+the preset) serves the model ladder. If that endpoint is unreachable — hal is
+off the VPN, the laptop Ollama isn't running — the agent loop will hang or fail
+deep inside a request with no obvious cause. Worse, a silent default to the
+wrong preset (the historical "it's using my laptop, not hal" bug) looks fine
+until you notice the weak single-tier model.
+`preflight()` makes that visible: it prints which preset/endpoint is active and
+probes `{base_url}/models` once at startup. On success it prints a one-line
+banner with the model count; on failure it prints a loud warning naming the dead
+URL and which *other* presets are reachable right now, so the fix is obvious.
+It never raises and never blocks the app from starting — it only informs.
+"""
+from __future__ import annotations
+import json
+import sys
+import urllib.error
+import urllib.request
+from .config import Preset, _PRESETS, load_preset
+_TIMEOUT = 4.0
+# ANSI: bold, green ok, red warn — degrade to plain text when not a TTY.
+_BOLD, _GREEN, _RED, _DIM, _RESET = "\033[1m", "\033[32m", "\033[31m", "\033[2m", "\033[0m"
+def _color(s: str, code: str) -> str:
+    return f"{code}{s}{_RESET}" if sys.stderr.isatty() else s
+def list_models(base_url: str, timeout: float = _TIMEOUT) -> list[str]:
+    """Fetch model IDs from {base_url}/models. Returns [] on failure."""
+    url = base_url.rstrip("/") + "/models"
+    try:
+        with urllib.request.urlopen(url, timeout=timeout) as resp:
+            if resp.status != 200:
+                return []
+            data = json.loads(resp.read().decode("utf-8", "replace"))
+        models = data.get("data") if isinstance(data, dict) else None
+        if not isinstance(models, list):
+            return []
+        ids: list[str] = []
+        for m in models:
+            if isinstance(m, dict) and m.get("id"):
+                ids.append(str(m["id"]))
+        return sorted(ids)
+    except (urllib.error.URLError, TimeoutError, OSError, ValueError, json.JSONDecodeError):
+        return []
+def probe(base_url: str, timeout: float = _TIMEOUT,
+          api_key: str | None = None) -> tuple[bool, int | None, str | None]:
+    """Return (reachable, model_count, error). Never raises.
+    Sends the bearer token so endpoints that require auth (e.g. a vLLM server
+    started with --api-key) report reachable instead of a spurious 401."""
+    url = base_url.rstrip("/") + "/models"
+    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
+    try:
+        req = urllib.request.Request(url, headers=headers)
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            if resp.status != 200:
+                return False, None, f"HTTP {resp.status}"
+            data = json.loads(resp.read().decode("utf-8", "replace"))
+        models = data.get("data") if isinstance(data, dict) else None
+        count = len(models) if isinstance(models, list) else None
+        return True, count, None
+    except urllib.error.URLError as e:
+        return False, None, getattr(e, "reason", str(e)).__str__()
+    except (TimeoutError, OSError, ValueError, json.JSONDecodeError) as e:
+        return False, None, str(e)
+def _reachable_alternatives(active_key: str) -> list[str]:
+    """Which *other* known presets answer right now — points at the easy fix."""
+    out = []
+    for key, preset in _PRESETS.items():
+        if key == active_key:
+            continue
+        ok, _count, _err = probe(preset.base_url, timeout=2.0, api_key=preset.api_key)
+        if ok:
+            out.append(f"{key} ({preset.base_url})")
+    return out
+def preflight(preset: Preset | None = None) -> bool:
+    """Print a startup banner for the active backend. Returns True if reachable."""
+    preset = preset or load_preset()
+    tiers = " · ".join(f"{t.name}:{t.model}" for t in preset.tiers)
+    ok, count, err = probe(preset.base_url, api_key=preset.api_key)
+    if ok:
+        models = f"{count} models" if count is not None else "reachable"
+        banner = (f"smolcode backend: preset={preset.key} · {preset.base_url} "
+                  f"· {models}\n  tiers: {tiers}")
+        print(_color(banner, _BOLD + _GREEN), file=sys.stderr)
+        return True
+    lines = [
+        _color("⚠ smolcode backend UNREACHABLE", _BOLD + _RED),
+        f"  preset={preset.key} · {preset.base_url} · {err}",
+        f"  tiers: {tiers}",
+    ]
+    alts = _reachable_alternatives(preset.key)
+    if alts:
+        lines.append("  reachable instead: " + ", ".join(alts))
+        lines.append(_color("  → set SMALLCODE_PRESET to one of the above, "
+                            "or fix the endpoint.", _DIM))
+    else:
+        lines.append(_color("  → no known preset endpoint is answering right now.", _DIM))
+    print("\n".join(lines), file=sys.stderr)
+    return False

engine/preview.py ADDED Viewed

	@@ -0,0 +1,161 @@

+"""Live-preview rendering for smolbuilder.
+Turns the agent's workspace (a `path -> content` dict of a small static web app)
+into a single self-contained HTML document, then into a sandboxed iframe that
+Gradio can drop straight into a `gr.HTML`. This is the "Replit/Lovable" preview:
+what the tiny model just built, running live in the browser.
+Deliberately dependency-free (stdlib only) so it can be unit-tested without
+Gradio or the Rust engine, and so the rendering logic stays trivially auditable.
+Design choices:
+- We inline locally-referenced `<link rel=stylesheet>` and `<script src=...>`
+  from sibling files, so a model that splits style.css / script.js out of
+  index.html still previews correctly — but we never touch absolute/CDN URLs.
+- The iframe is loaded via `srcdoc=` (not a `data:` URI). A `data:` URL has an
+  *opaque origin*, where `localStorage`/`sessionStorage` throw `SecurityError` —
+  so any app that persists state (a notepad, a to-do list) dies on load before it
+  can wire up its buttons. A `srcdoc` frame inherits the embedder's (Gradio's)
+  origin, so storage and scripts work the way the model expects.
+- SECURITY TRADE-OFF: `sandbox="allow-scripts allow-same-origin ..."` is required
+  for storage to work, but that combination also lets the framed (model-written)
+  code reach the parent page. This is acceptable for a *local, single-user*
+  builder — the framed code is the same user's own request, on a page holding no
+  one else's secrets. Do NOT reuse this wrapper to embed untrusted third-party
+  apps on an origin that holds other users' data; the isolation-preserving fix is
+  to serve the preview from a separate origin (out of scope here).
+- The same wrapper (`PREVIEW_SANDBOX`/`_escape_srcdoc`) is reused by the headless
+  verification check (engine/browsercheck.py) so the agent tests *exactly* what
+  the user sees.
+"""
+from __future__ import annotations
+import html
+import re
+# Sandbox flags shared by the live preview and the verification check.
+# allow-same-origin is required so srcdoc inherits the parent origin and web
+# storage works; combined with allow-scripts it weakens isolation (see docstring).
+PREVIEW_SANDBOX = "allow-scripts allow-same-origin allow-modals allow-popups allow-forms"
+# Files we know how to treat as the app entrypoint, best first.
+_ENTRY_CANDIDATES = ("index.html", "main.html", "app.html")
+_LINK_RE = re.compile(
+    r"""<link\b[^>]*?\brel\s*=\s*['"]?stylesheet['"]?[^>]*?>""", re.I | re.S)
+_SCRIPT_SRC_RE = re.compile(
+    r"""<script\b[^>]*?\bsrc\s*=\s*['"]([^'"]+)['"][^>]*?>\s*</script>""", re.I | re.S)
+_HREF_RE = re.compile(r"""\bhref\s*=\s*['"]([^'"]+)['"]""", re.I)
+def find_entry(files: dict[str, str]) -> str | None:
+    """Pick the HTML entrypoint to preview, or None if there's nothing webby."""
+    lower = {p.lower(): p for p in files}
+    for cand in _ENTRY_CANDIDATES:
+        if cand in lower:
+            return lower[cand]
+    # Fall back to any .html file (shallowest path wins for determinism).
+    htmls = sorted((p for p in files if p.lower().endswith(".html")),
+                   key=lambda p: (p.count("/"), p))
+    return htmls[0] if htmls else None
+def _is_local(url: str) -> bool:
+    """True for a same-app relative reference we can inline (not a CDN/data URI)."""
+    u = url.strip()
+    if not u:
+        return False
+    return not re.match(r"^(?:[a-z]+:)?//|^https?:|^data:|^mailto:|^#", u, re.I)
+def _lookup(files: dict[str, str], ref: str) -> str | None:
+    """Resolve a relative href/src against the workspace file map."""
+    ref = ref.split("?", 1)[0].split("#", 1)[0].lstrip("./").lstrip("/")
+    if ref in files:
+        return files[ref]
+    # Case-insensitive / basename fallback so '/style.css' finds 'style.css'.
+    base = ref.rsplit("/", 1)[-1].lower()
+    for path, content in files.items():
+        if path.lower() == ref.lower() or path.rsplit("/", 1)[-1].lower() == base:
+            return content
+    return None
+def inline_app(files: dict[str, str]) -> str:
+    """Return one self-contained HTML document for the app in `files`.
+    If there's no HTML entrypoint, render a friendly placeholder (e.g. the model
+    has only written notes or a not-yet-web file).
+    """
+    entry = find_entry(files)
+    if entry is None:
+        return _placeholder(files)
+    doc = files[entry]
+    def _inline_css(match: re.Match) -> str:
+        tag = match.group(0)
+        href_m = _HREF_RE.search(tag)
+        if not href_m or not _is_local(href_m.group(1)):
+            return tag
+        css = _lookup(files, href_m.group(1))
+        if css is None:
+            return tag
+        return f"<style>\n{css}\n</style>"
+    def _inline_js(match: re.Match) -> str:
+        src = match.group(1)
+        if not _is_local(src):
+            return match.group(0)
+        js = _lookup(files, src)
+        if js is None:
+            return match.group(0)
+        # Guard against the inlined body prematurely closing the script element.
+        safe = js.replace("</script>", "<\\/script>")
+        return f"<script>\n{safe}\n</script>"
+    doc = _LINK_RE.sub(_inline_css, doc)
+    doc = _SCRIPT_SRC_RE.sub(_inline_js, doc)
+    return doc
+def _escape_srcdoc(doc: str) -> str:
+    """Escape an HTML document for a double-quoted `srcdoc="..."` attribute.
+    Only `&` and `"` are significant inside a double-quoted attribute value, and
+    `&` must go first (so the `&` we introduce for `"` isn't re-escaped). `<`,
+    `>` and even a literal `</script>` are FINE here — the parser is in
+    attribute-value state, not script-data state — so we must NOT touch them
+    (html.escape would corrupt the rendered document).
+    """
+    return doc.replace("&", "&amp;").replace('"', "&quot;")
+def preview_iframe(files: dict[str, str], *, height: int = 540) -> str:
+    """Render the app as a sandboxed `srcdoc` iframe ready for `gr.HTML`."""
+    srcdoc = _escape_srcdoc(inline_app(files))
+    return (
+        f'<iframe title="smolbuilder preview" '
+        f'style="width:100%;height:{height}px;border:0;border-radius:12px;'
+        f'background:#fff;box-shadow:0 1px 0 rgba(0,0,0,.06)" '
+        f'sandbox="{PREVIEW_SANDBOX}" '
+        f'srcdoc="{srcdoc}"></iframe>'
+    )
+def _placeholder(files: dict[str, str]) -> str:
+    listing = "".join(
+        f"<li><code>{html.escape(p)}</code></li>" for p in sorted(files)
+    ) or "<li><em>workspace is empty</em></li>"
+    return (
+        "<!doctype html><html><head><meta charset='utf-8'>"
+        "<style>body{font:15px/1.5 system-ui,sans-serif;color:#475569;"
+        "background:#f8fafc;padding:2rem}h2{color:#7c3aed;margin:.2rem 0 1rem}"
+        "code{background:#ede9fe;color:#5b21b6;padding:1px 6px;border-radius:6px}"
+        "</style></head><body>"
+        "<h2>No preview yet</h2>"
+        "<p>smolbuilder previews the app's <code>index.html</code>. "
+        "Describe a web app on the left and it'll appear here, live.</p>"
+        f"<p>Files in the workspace:</p><ul>{listing}</ul>"
+        "</body></html>"
+    )

engine/route_clf.py ADDED Viewed

	@@ -0,0 +1,243 @@

+"""Learned routing classifier — the confidence-gated upgrade to the regex router.
+smolcode's router historically guesses two things from cheap regex
+([router.classify_specialty][engine.router.classify_specialty] and
+[router.classify_tier][engine.router.classify_tier]). This module adds tiny
+learned classifiers (SetFit backbone + light head, exported to int8 ONNX) that
+predict, per task:
+  - **specialty**  — which fine-tune family (16-way)
+  - **tier**       — a difficulty bucket -> the *starting* rung in the ladder
+  - **escalate**   — whether the task will likely need a bigger model
+Thinking level (off/low/high/xtra) is *derived* from (tier, escalate), not a
+separate model.
+The design is deliberately "pure upside": every prediction is gated by a
+calibrated confidence threshold. Below threshold — or if onnxruntime / the model
+artifacts aren't present at all — the field **falls back to the existing regex**,
+so we can never route worse than the status quo and rules-confident cases stay
+100% deterministic.
+Heavy deps (onnxruntime, tokenizers, numpy) are imported lazily; if any is
+missing the classifier simply abstains everywhere and the regex drives routing.
+"""
+from __future__ import annotations
+import functools
+import json
+import os
+from pathlib import Path
+from pydantic import BaseModel, Field
+from .router import classify_specialty, classify_tier
+# Difficulty buckets the tier head predicts; mapped onto the ladder by
+# start = min(bucket, n_tiers - 1) — exactly classify_tier's clamping contract,
+# so the head stays ladder-length-agnostic.
+TIER_BUCKETS = 3
+# Ordered thinking levels (matches smolcode-cli/src/router.rs Think enum).
+THINK_LEVELS = ("off", "low", "high", "xtra")
+# Default per-head confidence thresholds; overridden by router_clf.json's
+# "thresholds" map written at export/calibration time.
+_DEFAULT_TAU = {"specialty": 0.60, "tier": 0.55, "escalate": 0.65}
+_DEFAULT_DIR = Path(__file__).resolve().parent.parent / "finetune" / "router_clf" / "onnx"
+class RouteDecision(BaseModel):
+    """The typed routing decision. `tier` is a start index into the active ladder."""
+    specialty: str
+    tier: int
+    escalate: bool
+    think: str
+    # Per-field model confidence (0.0 when the field came from regex/default).
+    confidences: dict[str, float] = Field(default_factory=dict)
+    # Per-field provenance: "model" | "regex" | "default" — for telemetry/debugging.
+    sources: dict[str, str] = Field(default_factory=dict)
+def _softmax(row):  # row: 1-D numpy array
+    import numpy as np
+    # If the ONNX head already emits a probability distribution, don't re-normalize
+    # (argmax is unaffected either way, but confidence should stay honest).
+    if row.min() >= 0.0 and abs(float(row.sum()) - 1.0) < 1e-3:
+        return row
+    e = np.exp(row - row.max())
+    return e / e.sum()
+class _OnnxHead:
+    """A single ONNX sequence-classification head + its tokenizer and label map."""
+    def __init__(self, session, tokenizer, labels: list[str], input_names: set[str],
+                 max_len: int = 128) -> None:
+        self.session = session
+        self.tokenizer = tokenizer
+        self.labels = labels
+        self.input_names = input_names
+        self.max_len = max_len
+    @classmethod
+    def try_load(cls, dpath: Path) -> "_OnnxHead | None":
+        """Load model.onnx + tokenizer.json + labels.json from a dir, or None."""
+        model_file, tok_file, labels_file = (
+            dpath / "model.onnx", dpath / "tokenizer.json", dpath / "labels.json",
+        )
+        if not (model_file.exists() and tok_file.exists() and labels_file.exists()):
+            return None
+        import onnxruntime as ort
+        from tokenizers import Tokenizer
+        sess = ort.InferenceSession(
+            str(model_file), providers=["CPUExecutionProvider"],
+        )
+        tok = Tokenizer.from_file(str(tok_file))
+        meta = json.loads(labels_file.read_text())
+        labels = meta["labels"] if isinstance(meta, dict) else list(meta)
+        max_len = int(meta.get("max_len", 128)) if isinstance(meta, dict) else 128
+        input_names = {i.name for i in sess.get_inputs()}
+        return cls(sess, tok, labels, input_names, max_len=max_len)
+    def predict(self, text: str) -> tuple[str, float]:
+        """(label, confidence) for the argmax class."""
+        import numpy as np
+        enc = self.tokenizer.encode(text)
+        ids = enc.ids[: self.max_len]
+        mask = [1] * len(ids)
+        feed = {
+            "input_ids": np.asarray([ids], dtype=np.int64),
+            "attention_mask": np.asarray([mask], dtype=np.int64),
+        }
+        if "token_type_ids" in self.input_names:
+            feed["token_type_ids"] = np.zeros((1, len(ids)), dtype=np.int64)
+        out = self.session.run(None, feed)[0]
+        probs = _softmax(np.asarray(out)[0])
+        idx = int(probs.argmax())
+        return self.labels[idx], float(probs[idx])
+class RouteClassifier:
+    """Loads the (optional) ONNX heads and turns a task string into a RouteDecision.
+    Always safe to construct: missing deps or artifacts -> empty `heads`, and every
+    prediction abstains to the regex baseline.
+    """
+    def __init__(self, model_dir: str | os.PathLike | None = None) -> None:
+        self.model_dir = Path(
+            model_dir or os.environ.get("SMALLCODE_ROUTER_CLF_DIR", _DEFAULT_DIR)
+        )
+        self.heads: dict[str, _OnnxHead] = {}
+        self.thresholds = dict(_DEFAULT_TAU)
+        self.think_map: dict | None = None
+        self._load()
+    def _load(self) -> None:
+        try:  # the heavy trio — absent in a bare runtime, which is fine.
+            import numpy  # noqa: F401
+            import onnxruntime  # noqa: F401
+            import tokenizers  # noqa: F401
+        except Exception:
+            return
+        cfg_path = self.model_dir / "router_clf.json"
+        if cfg_path.exists():
+            try:
+                cfg = json.loads(cfg_path.read_text())
+                self.thresholds.update(cfg.get("thresholds", {}))
+                self.think_map = cfg.get("think_map")
+            except Exception:
+                pass
+        for name in ("specialty", "tier", "escalate"):
+            try:
+                head = _OnnxHead.try_load(self.model_dir / name)
+            except Exception:
+                head = None
+            if head is not None:
+                self.heads[name] = head
+    @property
+    def available(self) -> bool:
+        return bool(self.heads)
+    # --- per-decision helpers (model if confident, else regex/default) --------
+    def pick_specialty(self, task: str, specialties=None) -> tuple[str, float, str]:
+        head = self.heads.get("specialty")
+        if head is not None:
+            label, conf = head.predict(task)
+            ok = conf >= self.thresholds["specialty"]
+            if ok and (specialties is None or label in specialties):
+                return label, conf, "model"
+        return classify_specialty(task), 0.0, "regex"
+    def pick_tier(self, task: str, n_tiers: int) -> tuple[int, float, str]:
+        head = self.heads.get("tier")
+        if head is not None:
+            label, conf = head.predict(task)
+            if conf >= self.thresholds["tier"]:
+                try:
+                    bucket = int(label)
+                except ValueError:
+                    bucket = 0
+                return min(bucket, max(n_tiers - 1, 0)), conf, "model"
+        return classify_tier(task, n_tiers), 0.0, "regex"
+    def pick_escalate(self, task: str) -> tuple[bool, float, str]:
+        head = self.heads.get("escalate")
+        if head is not None:
+            label, conf = head.predict(task)
+            if conf >= self.thresholds["escalate"]:
+                return label in ("1", "true", "yes", "escalate"), conf, "model"
+        # No regex equivalent — default to "no escalation predicted".
+        return False, 0.0, "default"
+    def think_for(self, tier: int, n_tiers: int, escalate: bool) -> str:
+        if self.think_map:
+            key = f"{min(tier, n_tiers - 1)}:{int(escalate)}"
+            lvl = self.think_map.get(key) or self.think_map.get(str(tier))
+            if lvl in THINK_LEVELS:
+                return lvl
+        return default_think(tier, n_tiers, escalate)
+    def decide(self, task: str, *, specialties=None, n_tiers: int = 1) -> RouteDecision:
+        sp, sp_c, sp_s = self.pick_specialty(task, specialties)
+        tier, t_c, t_s = self.pick_tier(task, n_tiers)
+        esc, e_c, e_s = self.pick_escalate(task)
+        return RouteDecision(
+            specialty=sp,
+            tier=tier,
+            escalate=esc,
+            think=self.think_for(tier, n_tiers, esc),
+            confidences={"specialty": sp_c, "tier": t_c, "escalate": e_c},
+            sources={"specialty": sp_s, "tier": t_s, "escalate": e_s},
+        )
+def default_think(tier: int, n_tiers: int, escalate: bool) -> str:
+    """Monotone map: a higher start rung / predicted escalation -> more thinking."""
+    if n_tiers <= 1:
+        return "high" if escalate else "off"
+    frac = tier / (n_tiers - 1)
+    if frac >= 0.999:
+        return "xtra" if escalate else "high"
+    if frac >= 0.5:
+        return "high" if escalate else "low"
+    return "low" if escalate else "off"
+@functools.lru_cache(maxsize=1)
+def get_classifier() -> RouteClassifier:
+    """Process-wide singleton (loads ONNX sessions once)."""
+    return RouteClassifier()
+def classify_route(task: str, *, specialties=None, n_tiers: int = 1) -> RouteDecision:
+    """Public entry: a typed, confidence-gated routing decision for `task`."""
+    return get_classifier().decide(task, specialties=specialties, n_tiers=n_tiers)

engine/router.py ADDED Viewed

	@@ -0,0 +1,455 @@

+"""Tiered model router — the "forge-router" pattern.
+The point of smolcode: don't burn a 32B model on a one-line helper, and don't
+fail a hard task on a 3B. The router picks a *starting* tier from a cheap
+complexity heuristic, runs the agent, then **escalates on failure**: if the
+produced code doesn't actually pass when re-run, it retries the whole task on the
+next-bigger model. The tier that ultimately solved it is surfaced for the UI badge.
+Each tier is an independent SmallCodeAgent (its own model + fresh workspace), so
+every model in the ladder uses LiteForge's native tool-calling loop — no parsing
+hacks. All tiers are <=32B to stay hackathon-eligible.
+"""
+from __future__ import annotations
+import os
+import re
+from collections.abc import AsyncIterator
+from dataclasses import dataclass, field
+from . import browsercheck
+from .agent import SmallCodeAgent, Step
+from .config import Preset, SpecialistLadder, SpecialistPreset, Tier, load_preset
+from .judge import judge_correct, judge_enabled
+from .live_run import LiveFrame
+from .preview import find_entry, inline_app
+from .trace_collector import TraceEvent
+from .ui_trace import merge_step_metadata
+# Signals that a task is non-trivial and worth starting higher up the ladder.
+# Leading \b + trailing \w* so stems match their word family
+# (recursi -> recursive, optimi -> optimize, concurren -> concurrency).
+_HARD_HINTS = re.compile(
+    r"\b(class|async|thread|concurren|regex|pars|algorithm|optimi|recursi|"
+    r"benchmark|refactor|multiple files|api|server|database|sql|decorator|"
+    r"generator|data ?structure|graph|tree|dynamic programming)\w*",
+    re.I,
+)
+def _route_classifier():
+    """The learned routing classifier singleton, or None if unavailable.
+    Importing route_clf pulls in pydantic (and lazily onnxruntime); any failure
+    here just means we route with the regex baseline below.
+    """
+    try:
+        from .route_clf import get_classifier
+        return get_classifier()
+    except Exception:
+        return None
+def classify_tier(task: str, n_tiers: int) -> int:
+    """Pick a starting tier index (0 = smallest). Cheap, transparent heuristic."""
+    if n_tiers <= 1:
+        return 0
+    score = 0
+    if len(task) > 280:
+        score += 1
+    if len(_HARD_HINTS.findall(task)) >= 1:
+        score += 1
+    if len(_HARD_HINTS.findall(task)) >= 3:
+        score += 1
+    return min(score, n_tiers - 1)
+# --- specialty (language/function) classifier --------------------------------
+# Picks the specialist *family* for a task; classify_tier then picks the size
+# within it. Same cheap, transparent, ordered-regex style as classify_tier.
+# Priority on ties (earlier wins); 'py' is last because it's the safe default.
+# `orchestrate` is first: explicit fan-out language is a strong, specific signal
+# that should win over an incidental language mention.
+_SPECIALTY_ORDER = ("orchestrate", "git", "terraform", "docker", "sql", "powershell",
+                    "bsd", "rust", "go", "cpp", "java", "dotnet", "csharp", "bash",
+                    "js", "py")
+_FENCE_LANG = re.compile(r"```([a-z0-9+#.]+)", re.I)
+_FENCE_TO_SPECIALTY = {
+    "python": "py", "py": "py", "pytest": "py",
+    "bash": "bash", "sh": "bash", "shell": "bash", "zsh": "bash", "console": "bash",
+    "powershell": "powershell", "ps1": "powershell", "pwsh": "powershell",
+    "sql": "sql", "psql": "sql", "sqlite": "sql",
+    "javascript": "js", "js": "js", "ts": "js", "typescript": "js",
+    "jsx": "js", "tsx": "js", "node": "js",
+    "go": "go", "golang": "go",
+    "rust": "rust", "rs": "rust",
+    "cpp": "cpp", "c++": "cpp", "cc": "cpp", "c": "cpp",
+    "java": "java",
+    "csharp": "csharp", "cs": "csharp",
+    "dockerfile": "docker", "docker": "docker",
+    "hcl": "terraform", "terraform": "terraform", "tf": "terraform",
+}
+_EXT_RE = re.compile(r"\.(py|sh|bash|ps1|sql|js|mjs|cjs|ts|tsx|jsx|go|rs|cpp|cc|cxx|"
+                     r"hpp|java|cs|csproj|tf|dockerfile)\b", re.I)
+_EXT_TO_SPECIALTY = {
+    "py": "py", "sh": "bash", "bash": "bash", "ps1": "powershell", "sql": "sql",
+    "js": "js", "mjs": "js", "cjs": "js", "ts": "js", "tsx": "js", "jsx": "js",
+    "go": "go", "rs": "rust", "cpp": "cpp", "cc": "cpp", "cxx": "cpp", "hpp": "cpp",
+    "java": "java", "cs": "csharp", "csproj": "dotnet", "tf": "terraform",
+    "dockerfile": "docker",
+}
+_SPECIALTY_HINTS = {
+    # Fan-out / parallel delegation work -> the task_batch specialist.
+    "orchestrate": re.compile(r"\b(in parallel|fan ?out|concurrently|task_batch|"
+                              r"orchestrat|several independent|multiple independent|"
+                              r"simultaneously|batch of (tasks|jobs))\w*", re.I),
+    # NOTE: `staged` requires the trailing 'd' so it does NOT match "stage" inside
+    # "multi-stage" (a docker term) — that false-positive misrouted Docker tasks.
+    "git": re.compile(r"\b(git|commit|rebase|cherry-?pick|merge conflict|stash|"
+                      r"\bbranch\b|pull request|\bPR\b|revert|bisect|staged)\w*", re.I),
+    "terraform": re.compile(r"\b(terraform|\bhcl\b|\.tf\b|provider|resource block|"
+                            r"infrastructure as code|\biac\b|tfstate)\w*", re.I),
+    "docker": re.compile(r"\b(docker|dockerfile|docker-?compose|container image|"
+                         r"\bimage\b|\bbuild -t\b|entrypoint)\w*", re.I),
+    "sql": re.compile(r"\b(sql|select |insert |update |delete |join|schema|"
+                      r"\btable\b|\bindex\b|migration|postgres|sqlite|mysql|query)\w*", re.I),
+    "powershell": re.compile(r"\b(powershell|pwsh|\.ps1|cmdlet|get-|set-|write-output)\w*", re.I),
+    "bsd": re.compile(r"\b(freebsd|openbsd|netbsd|\bbsd\b|pf\.conf|rc\.d|pkg_add)\w*", re.I),
+    "rust": re.compile(r"\b(rust|cargo|crate|rustc|\.rs\b|borrow checker|tokio)\w*", re.I),
+    "go": re.compile(r"\b(golang|\bgo\b|goroutine|go mod|go test|\.go\b)\w*", re.I),
+    "cpp": re.compile(r"\b(c\+\+|cpp|g\+\+|clang|std::|cmake|\.cpp\b|template)\w*", re.I),
+    "java": re.compile(r"\b(java|maven|gradle|\bjvm\b|junit|\.java\b)\w*", re.I),
+    "dotnet": re.compile(r"\b(\.net|dotnet|nuget|asp\.net|\.csproj|msbuild)\w*", re.I),
+    "csharp": re.compile(r"\b(c#|csharp|\blinq\b|\.cs\b|\bxunit\b)\w*", re.I),
+    "bash": re.compile(r"\b(shell script|\bbash\b|\bzsh\b|chmod|grep|sed|awk|"
+                       r"\bpipe\b|cron|stdout|stderr|\$PATH)\w*", re.I),
+    "js": re.compile(r"\b(javascript|typescript|node|npm|react|vue|jsx|tsx|"
+                     r"webpack|vite|eslint|package\.json)\w*", re.I),
+    "py": re.compile(r"\b(python|pytest|pandas|numpy|django|flask|pip|venv|"
+                     r"def |async def|decorator)\w*", re.I),
+}
+def classify_specialty(task: str, *, default: str = "py") -> str:
+    """Pick the specialist family key for a task. Cheap, transparent, deterministic.
+    Precedence (most explicit signal first): SMALLCODE_SPECIALTY env override ->
+    code-fence language tag -> file extensions mentioned -> keyword-cue scoring ->
+    default. Mirrors classify_tier's style; pairs with it for 2D routing.
+    """
+    forced = os.environ.get("SMALLCODE_SPECIALTY")
+    if forced:
+        return forced.strip().lower()
+    # A fenced code block (```lang) is the single most explicit signal -> hard win.
+    for lang in _FENCE_LANG.findall(task):
+        s = _FENCE_TO_SPECIALTY.get(lang.lower())
+        if s:
+            return s
+    # Otherwise SCORE keyword cues AND file-extension mentions together, so a strong
+    # action signal (e.g. "rebase ... merge conflict") beats an incidental ".py"
+    # filename. Ties broken by _SPECIALTY_ORDER (earlier = higher priority).
+    scores = {s: len(rx.findall(task)) for s, rx in _SPECIALTY_HINTS.items()}
+    for e in _EXT_RE.findall(task):
+        s = _EXT_TO_SPECIALTY.get(e.lower())
+        if s:
+            scores[s] = scores.get(s, 0) + 1
+    best = max(scores, key=lambda s: (scores[s], -_SPECIALTY_ORDER.index(s)))
+    if scores[best] > 0:
+        return best
+    return default
+@dataclass
+class RouteResult:
+    final: str
+    steps: list[Step]
+    tier_name: str
+    tier_model: str
+    start_tier: str
+    escalations: int
+    verified: bool
+    specialty: str = "general"
+    files: dict[str, str] = field(default_factory=dict)
+    trace_events: list[TraceEvent] = field(default_factory=list)
+    agent: SmallCodeAgent | None = None
+def _smoke_command(files: list[str]) -> str | None:
+    """A best-effort 'does it build/run (and pass any tests)?' shell command for a
+    NON-Python solution, or None if the language isn't recognized. Mirrors the
+    per-specialty run commands (finetune/specialties.py) so the router can escalate
+    on go/rust/js/sql/… exactly like it does on Python via run_python."""
+    def ext(e: str) -> list[str]:
+        return [f for f in files if f.endswith(e)]
+    if ext(".go"):
+        if any(f.endswith("_test.go") for f in files):
+            return "go test ./... 2>&1"
+        return "go run . 2>&1 || go run *.go 2>&1"
+    if "Cargo.toml" in files:
+        return "cargo test -q 2>&1 || cargo build -q 2>&1"
+    if ext(".rs"):
+        return f"rustc {ext('.rs')[0]} -o /tmp/_smv 2>&1 && /tmp/_smv"
+    js = ext(".js") + ext(".mjs") + ext(".cjs") + ext(".ts")
+    if "package.json" in files:
+        return "npm test --silent 2>&1 || node --test 2>&1"
+    if js:
+        if any(".test." in f or ".spec." in f for f in js):
+            return "node --test 2>&1"
+        entry = next((f for f in js if f in ("index.js", "main.js")), js[0])
+        return f"node {entry} 2>&1"
+    if ext(".sql"):
+        return f"sqlite3 :memory: < {ext('.sql')[0]} 2>&1"
+    if ext(".cpp") or ext(".cc"):
+        srcs = " ".join(ext(".cpp") + ext(".cc"))
+        return f"g++ -std=c++17 {srcs} -o /tmp/_smv 2>&1 && /tmp/_smv"
+    if ext(".java"):
+        main = "Main" if "Main.java" in files else ext(".java")[0][:-5]
+        return f"javac *.java 2>&1 && java {main} 2>&1"
+    if ext(".sh"):
+        return f"bash {ext('.sh')[0]} 2>&1"
+    if ext(".tf"):
+        return "terraform init -backend=false 2>&1 && terraform validate 2>&1"
+    if "Program.cs" in files or ext(".cs"):
+        return "dotnet run 2>&1"
+    return None
+def _verify(agent: SmallCodeAgent) -> bool | None:
+    """Independently check the agent's output actually works.
+    Returns True/False if there's something runnable to check, else None
+    (unverifiable — don't escalate purely on a missing signal). Python uses the
+    pytest/run_python fast paths; other languages smoke-run via run_shell so the
+    specialist router escalates on a broken go/rust/sql/… solution instead of
+    silently accepting the smallest tier.
+    """
+    ws = agent.workspace
+    files = ws.list_files()
+    pys = [f for f in files if f.endswith(".py")]
+    if pys:
+        if any("test" in f.lower() for f in pys):
+            return ws.run_tests().ok
+        entry = next((f for f in pys if f in ("main.py", "solution.py")), None) or pys[0]
+        return ws.run_python(path=entry).ok
+    # Web app (index.html + browser JS): render it in a real browser — must come
+    # BEFORE the shell smoke-run so we don't `node` browser-side JS. Same signal
+    # smolbuilder's WebBuilder uses (engine/builder._evaluate).
+    web_files = agent.files()
+    if find_entry(web_files) is not None:
+        ok, _errors = browsercheck.check_html(inline_app(web_files))
+        return ok
+    cmd = _smoke_command(files)
+    if cmd is not None:
+        return ws.run_shell(cmd, timeout=90).ok
+    return None
+def _build_result(agent: SmallCodeAgent, final: str, steps: list[Step], tier: Tier,
+                  start_name: str, escalations: int, verified: bool,
+                  specialty: str = "general") -> RouteResult:
+    events = merge_step_metadata(agent.trace_collector.snapshot(), agent.raw_history())
+    return RouteResult(
+        final=final, steps=steps, tier_name=tier.name, tier_model=tier.model,
+        start_tier=start_name, escalations=escalations, verified=verified,
+        specialty=specialty, files=agent.files(), trace_events=events, agent=agent,
+    )
+# Difficulty buckets the tier head predicts (matches route_clf.TIER_BUCKETS). Kept as
+# a local constant so router.py imports even when route_clf's deps (pydantic) are
+# absent. The bucket drives BOTH the thinking level and the start-tier clamp, so it's
+# decoupled from the ladder length — think stays meaningful even for a pinned 1-tier
+# preset.
+_THINK_BUCKETS = 3
+class Router:
+    def __init__(
+        self,
+        preset: Preset | None = None,
+        max_steps: int = 12,
+        approval_handler=None,
+        workspace_dir: str | None = None,
+        think: str = "off",
+        yolo: bool = False,
+        agent: str = "build",
+        size_floor: str | None = None,
+    ) -> None:
+        self.preset = preset or load_preset()
+        self.tiers: list[Tier] = self.preset.tiers
+        self.max_steps = max_steps
+        self.approval_handler = approval_handler
+        self.workspace_dir = workspace_dir
+        self.think = think
+        self.yolo = yolo
+        self.agent_name = agent
+        # "Auto · <size>" pins the START rung to this specialist size (e.g. "3b") while
+        # the router still picks the specialty and escalation still climbs the ladder.
+        self.size_floor = size_floor
+    async def run(self, task: str) -> RouteResult:
+        result: RouteResult | None = None
+        async for frame in self.run_live(task):
+            if frame.done and isinstance(frame.result, RouteResult):
+                result = frame.result
+        assert result is not None
+        return result
+    def _ladder_for(self, task: str, specialty: str | None = None) -> SpecialistLadder:
+        """The size ladder for this task's specialty (generic if not a matrix preset).
+        `specialty` may be supplied by the learned classifier; falls back to the
+        regex classify_specialty when not given.
+        """
+        if isinstance(self.preset, SpecialistPreset):
+            if specialty is None:
+                specialty = classify_specialty(task)
+            return self.preset.ladder_for(specialty)
+        return SpecialistLadder(specialty="general", tiers=self.preset.tiers)
+    def _size_floor_index(self, tiers: list[Tier], size_floor: str) -> int:
+        """Start-rung index for an 'Auto · <size>' pin: the first ladder tier whose
+        size is >= the floor (closest available, then escalates). Falls back to 0."""
+        from .config import parse_size_b
+        target = parse_size_b(size_floor if str(size_floor).lower().endswith("b")
+                              else f"{size_floor}b")
+        if target <= 0:
+            return 0
+        for i, t in enumerate(tiers):
+            if parse_size_b(t.model) >= target:
+                return i
+        return max(len(tiers) - 1, 0)
+    def _route(self, task: str) -> tuple[SpecialistLadder, int, str]:
+        """Pick (ladder, start-tier index, thinking level) for a task.
+        Uses the learned RouteClassifier when it's confident; otherwise the regex
+        baseline. A difficulty bucket (decoupled from ladder length) drives both the
+        start rung and the thinking level. `size_floor` (Auto · <size>) overrides the
+        start rung; an explicit user `/think` (anything but the default "off") wins.
+        """
+        clf = _route_classifier()
+        has_clf = clf is not None and clf.available
+        # 1. specialty -> size ladder
+        if has_clf and isinstance(self.preset, SpecialistPreset):
+            specialty = clf.pick_specialty(task, list(self.preset.ladders))[0]
+            ladder = self._ladder_for(task, specialty=specialty)
+        else:
+            ladder = self._ladder_for(task)
+        tiers = ladder.tiers
+        # 2. difficulty bucket (0..TIER_BUCKETS-1) + escalation hint
+        if has_clf:
+            bucket = clf.pick_tier(task, _THINK_BUCKETS)[0]
+            esc = clf.pick_escalate(task)[0]
+        else:
+            bucket = classify_tier(task, _THINK_BUCKETS)
+            esc = False
+        # 3. start rung: an explicit size floor wins; else the difficulty bucket
+        if self.size_floor:
+            start = self._size_floor_index(tiers, self.size_floor)
+        else:
+            start = min(bucket, max(len(tiers) - 1, 0))
+        # 4. thinking level: explicit /think wins; else router-derived (clf only)
+        if self.think != "off":
+            think = self.think
+        elif has_clf:
+            think = clf.think_for(bucket, _THINK_BUCKETS, esc)
+        else:
+            think = "off"
+        return ladder, start, think
+    async def run_live(
+        self,
+        task: str,
+        *,
+        rust_session=None,
+    ) -> AsyncIterator[LiveFrame]:
+        """Yield live frames while routing; final frame carries RouteResult."""
+        ladder, start, think = self._route(task)
+        specialty = ladder.specialty
+        tiers = ladder.tiers
+        escalations = 0
+        last: RouteResult | None = None
+        prev_tier_name: str | None = None
+        for idx in range(start, len(tiers)):
+            tier = tiers[idx]
+            if prev_tier_name is not None:
+                yield LiveFrame(events=[
+                    TraceEvent(kind="tier_escalation", name=tier.name,
+                               detail=f"escalated from {prev_tier_name}"),
+                ])
+            # The start tier reuses the caller's session; make it run the ROUTED model
+            # (not whatever the UI last pinned), so "Auto" honors the router's pick and
+            # a concrete pin (single-tier ladder) runs exactly that model.
+            if idx == start and rust_session is not None:
+                try:
+                    rust_session.set_model(tier.model)
+                except Exception:
+                    pass
+            agent = SmallCodeAgent(
+                preset=self.preset,
+                model=tier.model,
+                max_steps=self.max_steps,
+                approval_handler=self.approval_handler,
+                workspace_dir=self.workspace_dir,
+                agent=self.agent_name,
+                yolo=self.yolo,
+                rust_session=rust_session if idx == start else None,
+            )
+            async for frame in agent.run_live_turn(
+                task, think=think, yolo=self.yolo,
+            ):
+                if not frame.done:
+                    yield frame
+                    continue
+                final, steps = frame.result
+                ok = False if (agent.hit_max_steps or agent.errored) else _verify(agent)
+                # _verify only proves the code RAN, not that it's correct. If it ran
+                # clean (ok is True) but a bigger tier exists, ask a judge whether the
+                # solution actually satisfies the task; a concrete "no" -> escalate.
+                if ok is True and idx < len(tiers) - 1 and judge_enabled():
+                    correct = await judge_correct(
+                        self.preset, tiers[idx + 1].model, task, agent.files(), final,
+                    )
+                    if not correct:
+                        ok = False
+                last = _build_result(
+                    agent, final, steps, tier, tiers[start].name,
+                    escalations, bool(ok), specialty=specialty,
+                )
+                if ok is not False:
+                    yield LiveFrame(
+                        steps=steps,
+                        events=last.trace_events,
+                        files=last.files,
+                        done=True,
+                        result=last,
+                    )
+                    return
+                if idx < len(tiers) - 1:
+                    agent.trace_collector.record_escalation(tier.name, tiers[idx + 1].name)
+                agent.cleanup()
+                escalations += 1
+                prev_tier_name = tier.name
+                break
+        if last is not None:
+            yield LiveFrame(
+                steps=last.steps,
+                events=last.trace_events,
+                files=last.files,
+                done=True,
+                result=last,
+            )

engine/rust_session.py ADDED Viewed

	@@ -0,0 +1,425 @@

+"""Python facade over the Rust smolcode agent engine (smolcode_core)."""
+from __future__ import annotations
+import asyncio
+import json
+import os
+import tempfile
+from collections.abc import Awaitable, Callable
+from dataclasses import dataclass, field
+from typing import Any
+from .trace_collector import TraceCollector, TraceEvent
+try:
+    import smolcode_core as _rust
+except ImportError:
+    _rust = None  # type: ignore
+def rust_available() -> bool:
+    return _rust is not None
+ApprovalHandler = Callable[[str], Awaitable[bool]]
+@dataclass
+class RustRunResult:
+    final: str
+    hit_max_steps: bool = False
+    errored: bool = False
+class RustSession:
+    """Thin wrapper around smolcode_core.Session."""
+    def __init__(
+        self,
+        *,
+        workspace: str | None = None,
+        agent: str = "build",
+        yolo: bool = False,
+        model: str | None = None,
+        base_url: str | None = None,
+        api_key: str | None = None,
+        profile: str = "full",
+        approval_handler: ApprovalHandler | None = None,
+    ) -> None:
+        if _rust is None:
+            raise RuntimeError(
+                "smolcode_core is not installed; build with "
+                "`maturin develop --release` in smolcode-cli/crates/smolcode-py"
+            )
+        if workspace is None:
+            workspace = os.environ.get(
+                "SMALLCODE_WORKSPACE",
+                tempfile.mkdtemp(prefix="smolcode-"),
+            )
+        self._session = _rust.Session(
+            workspace=workspace,
+            agent=agent,
+            yolo=yolo,
+            model=model,
+            base_url=base_url,
+            api_key=api_key,
+            profile=profile,
+        )
+        self.trace_collector = TraceCollector()
+        self.approval_handler = approval_handler
+        self.hit_max_steps = False
+        self.errored = False
+        self._steps: list[dict[str, Any]] = []
+        self._final: str = ""
+        self._cancelled = False
+    def request_cancel(self) -> None:
+        self._cancelled = True
+        self.cancel_turn()
+    @property
+    def cancelled(self) -> bool:
+        return self._cancelled
+    def clear_cancel(self) -> None:
+        self._cancelled = False
+    @property
+    def session_id(self) -> str:
+        return self._session.session_id
+    @property
+    def workspace_path(self) -> str:
+        return self._session.workspace()
+    def set_model(self, model: str) -> None:
+        self._session.set_model(model)
+    def set_agent(self, agent: str) -> None:
+        self._session.set_agent(agent)
+    def set_think(self, level: str) -> None:
+        self._session.set_think(level)
+    def register_tool(self, name: str, fn: Callable[[dict], dict]) -> None:
+        self._session.register_tool(name, fn)
+    def files(self) -> dict[str, str]:
+        out: dict[str, str] = {}
+        for path in self._session.workspace_files():
+            content = self._session.read_file(path)
+            if content is not None:
+                out[path] = content
+        return out
+    def run_shell(self, command: str) -> str:
+        return self._session.run_shell(command)
+    async def run(
+        self,
+        task: str,
+        *,
+        think: str | None = None,
+        yolo: bool | None = None,
+    ) -> RustRunResult:
+        """Run one agent turn to completion."""
+        self.hit_max_steps = False
+        self.errored = False
+        self._final = ""
+        self.clear_cancel()
+        self._session.start_turn(task, think=think, yolo=yolo)
+        final_text = ""
+        while True:
+            if self._cancelled:
+                break
+            ev = await asyncio.to_thread(self._session.poll_event)
+            if ev is None:
+                await asyncio.sleep(0.05)
+                continue
+            kind = ev.get("kind")
+            if kind == "approval":
+                approved = True
+                if self.approval_handler is not None:
+                    approved = await self.approval_handler(ev.get("desc", ""))
+                elif not (yolo if yolo is not None else False):
+                    approved = False
+                self._session.approve(approved)
+                continue
+            self._ingest_event(ev)
+            if kind == "final":
+                final_text = ev.get("text", "")
+            if kind == "done":
+                break
+            if kind == "error":
+                self.errored = True
+        self._final = final_text
+        if "step" in self._final.lower() and "without finishing" in self._final.lower():
+            self.hit_max_steps = True
+        self._session.record_turn(task, final_text)
+        return RustRunResult(
+            final=final_text,
+            hit_max_steps=self.hit_max_steps,
+            errored=self.errored,
+        )
+    async def poll_events_once(self) -> list[dict[str, Any]]:
+        """Non-blocking poll for live UI updates during a turn."""
+        events: list[dict[str, Any]] = []
+        while True:
+            ev = await asyncio.to_thread(self._session.poll_event)
+            if ev is None:
+                break
+            kind = ev.get("kind")
+            if kind == "approval":
+                approved = True
+                if self.approval_handler is not None:
+                    approved = await self.approval_handler(ev.get("desc", ""))
+                self._session.approve(approved)
+                continue
+            self._ingest_event(ev)
+            events.append(ev)
+            if kind in ("done",):
+                break
+        return events
+    def _ingest_event(self, ev: dict[str, Any]) -> None:
+        kind = ev.get("kind")
+        if kind == "tool_call":
+            args_raw = ev.get("args", "{}")
+            try:
+                args = json.loads(args_raw) if isinstance(args_raw, str) else args_raw
+            except json.JSONDecodeError:
+                args = {"raw": args_raw}
+            self.trace_collector.record_tool_call(ev.get("name", ""), args)
+        elif kind == "tool_result":
+            text = ev.get("text", "")
+            try:
+                result = json.loads(text)
+            except json.JSONDecodeError:
+                result = {"output": text}
+            self.trace_collector.record_tool_result(ev.get("name", ""), result)
+        elif kind == "final":
+            self.trace_collector.record_final(ev.get("text", ""))
+        elif kind == "error":
+            self.trace_collector.record_error(ev.get("text", ""))
+            self.errored = True
+    def save(self) -> None:
+        self._session.save()
+    @staticmethod
+    def list_sessions() -> list[dict[str, Any]]:
+        if _rust is None:
+            return []
+        return _rust.Session.list_sessions()
+    def load_session(self, session_id: str) -> bool:
+        return self._session.load_session(session_id)
+    def fork(self) -> str | None:
+        return self._session.fork()
+    def rename(self, title: str) -> bool:
+        return self._session.rename(title)
+    def delete(self) -> bool:
+        return self._session.delete()
+    def cancel_turn(self) -> None:
+        self._session.cancel_turn()
+    def render_config(self) -> str:
+        return self._session.render_config()
+def render_config(session: RustSession) -> str:
+    return session.render_config()
+def apply_settings(session: RustSession, settings: Any) -> None:
+    """Apply UI settings to a live Rust session before each agent turn.
+    The "auto" / "auto:<size>" pseudo-selections are NOT real model tags — the Router
+    picks the model and sets it on the session (see router.run_live), so we must not
+    push them via set_model. Only concrete pins are applied here.
+    """
+    session.set_think(settings.think)
+    model = settings.model or ""
+    if model and model != "auto" and not model.startswith("auto:"):
+        session.set_model(model)
+    session.set_agent(settings.agent)
+def list_commands(workspace: str) -> list[str]:
+    if _rust is None:
+        return []
+    return _rust.list_commands(workspace)
+def expand_command(workspace: str, name: str, args: str = "") -> str | None:
+    if _rust is None:
+        return None
+    return _rust.expand_command(workspace, name, args)
+def list_rules(workspace: str) -> list[dict[str, Any]]:
+    if _rust is None:
+        return []
+    return _rust.list_rules(workspace)
+def list_skills(workspace: str) -> list[dict[str, Any]]:
+    if _rust is None:
+        return []
+    return _rust.list_skills(workspace)
+def expand_skill(workspace: str, name: str, args: str = "") -> str | None:
+    if _rust is None:
+        return None
+    return _rust.expand_skill(workspace, name, args)
+def list_mcp(session: RustSession) -> list[dict[str, Any]]:
+    return session._session.list_mcp()
+def list_background_jobs() -> str:
+    if _rust is None:
+        return ""
+    return _rust.list_background_jobs()
+def write_agents_md(workspace: str) -> str:
+    if _rust is None:
+        raise RuntimeError("smolcode_core not installed")
+    return _rust.write_agents_md(workspace)
+def git_status(workspace: str) -> str:
+    if _rust is None:
+        return ""
+    return _rust.git_status(workspace)
+def workspace_tree(workspace: str, depth: int = 3) -> str:
+    if _rust is None:
+        return ""
+    return _rust.workspace_tree(workspace, depth=depth)
+UI_FILE_LIMIT = 1500
+AUTOCOMPLETE_FILE_LIMIT = 200
+ATTACH_FILE_MAX_BYTES = 8192
+def read_workspace_file(
+    workspace: str,
+    path: str,
+    *,
+    max_bytes: int = ATTACH_FILE_MAX_BYTES,
+    rust: RustSession | None = None,
+) -> str | None:
+    """Read a workspace file for @-attachment inlining. Returns None if missing."""
+    if _rust is None:
+        return None
+    try:
+        session = rust if rust is not None else RustSession(workspace=workspace, yolo=True)
+        content = session._session.read_file(path)
+        if content is None:
+            return None
+        if len(content) > max_bytes:
+            return content[:max_bytes] + "\n… (truncated)"
+        return content
+    except Exception:
+        return None
+def workspace_paths(workspace: str, *, limit: int = UI_FILE_LIMIT) -> tuple[list[str], int]:
+    """Workspace paths for UI sidebars (no file reads). Returns (paths, total_count)."""
+    if _rust is None:
+        return [], 0
+    session = RustSession(workspace=workspace, yolo=True)
+    paths = sorted(session._session.workspace_files())
+    total = len(paths)
+    if total > limit:
+        paths = paths[:limit]
+    return paths, total
+def workspace_files(workspace: str) -> dict[str, str]:
+    session = RustSession(workspace=workspace, yolo=True)
+    return session.files()
+def export_transcript(session_id: str, path: str | None = None) -> str:
+    if _rust is None:
+        raise RuntimeError("smolcode_core not installed")
+    return _rust.export_transcript(session_id, path)
+def session_timeline(session_id: str) -> list[str]:
+    if _rust is None:
+        return []
+    return _rust.session_timeline(session_id)
+def get_session_chat(session_id: str) -> list[dict[str, str]]:
+    if _rust is None:
+        return []
+    return _rust.get_session_chat(session_id)
+def chat_from_stored(lines: list[dict[str, str]]) -> list[dict[str, str]]:
+    """Convert stored session lines to Gradio chat messages."""
+    out: list[dict[str, str]] = []
+    for m in lines:
+        role = m.get("role", "assistant")
+        text = m.get("text", "")
+        if role == "user":
+            out.append({"role": "user", "content": text})
+        else:
+            out.append({"role": "assistant", "content": text})
+    return out
+def session_choices() -> list[str]:
+    """Dropdown labels: `title (id)`."""
+    return [
+        f"{r['title']} ({r['id']})"
+        for r in RustSession.list_sessions()
+    ]
+def parse_session_label(label: str) -> str | None:
+    if not label or "(" not in label:
+        return None
+    return label.rsplit("(", 1)[-1].rstrip(")")
+def load_rust_config(
+    *,
+    model: str | None = None,
+    base_url: str | None = None,
+    api_key: str | None = None,
+    agent: str | None = None,
+    yolo: bool = False,
+) -> dict[str, Any]:
+    """Load layered config.toml via Rust Config."""
+    if _rust is None:
+        return {}
+    cfg = _rust.Config.load(
+        model=model,
+        base_url=base_url,
+        api_key=api_key,
+        agent=agent,
+        yolo=yolo,
+    )
+    return {
+        "model": cfg.model,
+        "base_url": cfg.base_url,
+        "agent": cfg.agent,
+        "yolo": cfg.yolo,
+    }

engine/sandbox.py ADDED Viewed

	@@ -0,0 +1,141 @@

+"""Execution sandbox for model-generated code.
+This is the agentic core's "hands": it runs code the model writes and reports
+back stdout/stderr/exit so the agent can iterate to green.
+SECURITY: model-generated code is untrusted. The default here is a *soft*
+sandbox — a subprocess with a wall-clock timeout, a scratch working directory,
+and output caps. It is adequate for local/laptop use. Before exposing a public
+HF Space, wrap `_run` with a real isolator (nsjail/firejail/bubblewrap or an
+e2b/Docker microVM); the interface below does not change.
+"""
+from __future__ import annotations
+import os
+import shutil
+import subprocess
+import tempfile
+from dataclasses import dataclass
+from pathlib import Path
+DEFAULT_TIMEOUT = 20  # seconds
+MAX_OUTPUT = 20_000   # chars per stream, to keep the LLM context bounded
+@dataclass
+class RunResult:
+    ok: bool
+    stdout: str
+    stderr: str
+    exit_code: int
+    timed_out: bool = False
+    def as_tool_payload(self) -> dict:
+        """Compact dict handed back to the LLM as the tool result."""
+        return {
+            "ok": self.ok,
+            "exit_code": self.exit_code,
+            "timed_out": self.timed_out,
+            "stdout": _clip(self.stdout),
+            "stderr": _clip(self.stderr),
+        }
+def _clip(s: str, limit: int = MAX_OUTPUT) -> str:
+    if len(s) <= limit:
+        return s
+    return s[:limit] + f"\n...[truncated {len(s) - limit} chars]"
+class Workspace:
+    """A scratch directory the agent reads/writes/executes within.
+    All file tools are confined to this directory; paths are resolved and
+    checked so the model cannot escape via `..` or absolute paths.
+    """
+    def __init__(self, root: str | None = None) -> None:
+        self.root = Path(root) if root else Path(tempfile.mkdtemp(prefix="smallcode-"))
+        self.root.mkdir(parents=True, exist_ok=True)
+    # --- path safety -----------------------------------------------------
+    def _resolve(self, rel: str) -> Path:
+        p = (self.root / rel).resolve()
+        if not str(p).startswith(str(self.root.resolve())):
+            raise ValueError(f"path escapes workspace: {rel!r}")
+        return p
+    # --- file ops --------------------------------------------------------
+    def write_file(self, path: str, content: str) -> dict:
+        p = self._resolve(path)
+        p.parent.mkdir(parents=True, exist_ok=True)
+        p.write_text(content)
+        return {"ok": True, "path": path, "bytes": len(content.encode())}
+    def read_file(self, path: str) -> dict:
+        p = self._resolve(path)
+        if not p.exists():
+            return {"ok": False, "error": "not found", "path": path}
+        return {"ok": True, "path": path, "content": _clip(p.read_text())}
+    def list_files(self) -> list[str]:
+        return sorted(
+            str(p.relative_to(self.root))
+            for p in self.root.rglob("*")
+            if p.is_file()
+        )
+    # --- execution -------------------------------------------------------
+    def run_python(self, code: str | None = None, path: str | None = None,
+                   timeout: int = DEFAULT_TIMEOUT) -> RunResult:
+        if path:
+            target = self._resolve(path)
+            argv = ["python3", str(target)]
+        else:
+            f = self._resolve("_snippet.py")
+            f.write_text(code or "")
+            argv = ["python3", str(f)]
+        return self._run(argv, timeout)
+    def run_tests(self, timeout: int = DEFAULT_TIMEOUT) -> RunResult:
+        # pytest if available, falling back to unittest discovery.
+        argv = ["python3", "-m", "pytest", "-q"]
+        return self._run(argv, timeout)
+    def run_shell(self, command: str, timeout: int = DEFAULT_TIMEOUT) -> RunResult:
+        """Run a shell command in the workspace (login shell for full PATH).
+        Lets the router smoke-run non-Python solutions (go/rust/node/sqlite/…) the
+        same way run_python checks Python. Mirrors the Rust agent's run_shell and the
+        eval grader (smolcode-cli/src/eval.rs:check_cmd_ok), which also use `bash -lc`.
+        """
+        return self._run(["bash", "-lc", command], timeout)
+    def _run(self, argv: list[str], timeout: int) -> RunResult:
+        env = {**os.environ, "PYTHONDONTWRITEBYTECODE": "1"}
+        try:
+            proc = subprocess.run(
+                argv,
+                cwd=self.root,
+                env=env,
+                capture_output=True,
+                text=True,
+                timeout=timeout,
+            )
+            return RunResult(
+                ok=proc.returncode == 0,
+                stdout=proc.stdout,
+                stderr=proc.stderr,
+                exit_code=proc.returncode,
+            )
+        except subprocess.TimeoutExpired as e:
+            return RunResult(
+                ok=False,
+                stdout=e.stdout.decode() if isinstance(e.stdout, bytes) else (e.stdout or ""),
+                stderr=f"timed out after {timeout}s",
+                exit_code=124,
+                timed_out=True,
+            )
+    def cleanup(self) -> None:
+        shutil.rmtree(self.root, ignore_errors=True)

engine/themes.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""Web UI color themes aligned with the CLI TUI palettes."""
+from __future__ import annotations
+from dataclasses import dataclass
+@dataclass(frozen=True)
+class WebTheme:
+    name: str
+    bg: str
+    panel: str
+    bg_alt: str
+    accent: str
+    fg: str
+    dim: str
+    ok: str
+    tool: str
+    border: str
+    hf_yellow: str = "#FFD21E"
+WEB_THEMES: list[WebTheme] = [
+    WebTheme("smol-dark", "#0b1020", "#111827", "#1e293b", "#7c3aed", "#e2e8f0", "#64748b", "#34d399", "#a78bfa", "#334155"),
+    WebTheme("tokyo", "#1a1b26", "#24283b", "#1f2335", "#7dcfff", "#c0caf5", "#565f89", "#bb9af7", "#7dcfff", "#414868"),
+    WebTheme("gruvbox", "#282828", "#32302f", "#3c3836", "#fe8019", "#ebdbb2", "#928374", "#b8bb26", "#83a598", "#504945"),
+    WebTheme("mono", "#161616", "#1e1e1e", "#222222", "#e0e0e0", "#c0c0c0", "#707070", "#ffffff", "#a0a0a0", "#404040"),
+    WebTheme("catppuccin", "#1e1e2e", "#313244", "#313244", "#cba6f7", "#cdd6f4", "#6c7086", "#a6e3a1", "#89b4fa", "#45475a"),
+    WebTheme("nord", "#2e3440", "#3b4252", "#3b4252", "#88c0d0", "#eceff4", "#4c566a", "#a3be8c", "#81a1c1", "#3b4252"),
+    WebTheme("dracula", "#282a36", "#44475a", "#282a36", "#bd93f9", "#f8f8f2", "#6272a4", "#50fa7b", "#8be9fd", "#44475a"),
+    WebTheme("solarized", "#002b36", "#073642", "#073642", "#268bd2", "#839496", "#586e75", "#859900", "#2aa198", "#073642"),
+]
+def theme_names() -> list[str]:
+    return [t.name for t in WEB_THEMES]
+def theme_by_name(name: str) -> WebTheme:
+    for t in WEB_THEMES:
+        if t.name == name:
+            return t
+    return WEB_THEMES[0]
+def theme_at(index: int) -> WebTheme:
+    return WEB_THEMES[index % len(WEB_THEMES)]
+def theme_css_vars() -> str:
+    """Per-theme CSS variable overrides for .sc-tui-shell[data-theme=...]."""
+    blocks: list[str] = []
+    for t in WEB_THEMES:
+        blocks.append(
+            f'.sc-tui-shell[data-theme="{t.name}"] {{'
+            f" --sc-bg:{t.bg}; --sc-panel:{t.panel}; --sc-bg-alt:{t.bg_alt};"
+            f" --sc-accent:{t.accent}; --sc-fg:{t.fg}; --sc-dim:{t.dim};"
+            f" --sc-ok:{t.ok}; --sc-tool:{t.tool}; --sc-border:{t.border};"
+            f" --hf-yellow:{t.hf_yellow}; }}"
+        )
+    return "\n".join(blocks)

engine/tools.py ADDED Viewed

	@@ -0,0 +1,174 @@

+"""Coding tools exposed to the LiteForge agent.
+Each tool is a Python callable registered via `liteforge.create_tool`. The agent
+(running in Rust) decides when to call them; LiteForge invokes the callable with
+a single `dict` of arguments and feeds the returned JSON-able dict back to the
+model. All file/exec tools are confined to one `Workspace`.
+Tool surface (kept deliberately small so a 3B model can use it reliably):
+  write_file(path, content)  -> create/overwrite a file
+  read_file(path)            -> read a file back
+  list_files()               -> list workspace files
+  run_python(path)           -> execute a file, return stdout/stderr/exit
+  run_tests()                -> run pytest in the workspace
+"""
+from __future__ import annotations
+import liteforge as lf
+from . import browsercheck
+from .preview import inline_app
+from .sandbox import Workspace
+from .trace_collector import TraceCollector
+def _wrap(name: str, fn, collector: TraceCollector | None):
+    if collector is None:
+        return fn
+    def wrapped(args: dict):
+        collector.record_tool_call(name, args)
+        result = fn(args)
+        collector.record_tool_result(name, result)
+        return result
+    return wrapped
+# Tool names in the order _tools() returns them — lets a registry select a
+# subset by name without relying on attributes of the opaque lf tool object.
+_TOOL_ORDER = ("write_file", "read_file", "list_files", "run_python", "run_tests")
+# Tools the web builder needs. Static apps are "verified" by rendering, not by
+# running Python, so we drop run_python/run_tests — a smaller, less confusing
+# surface for a 3B model that should be writing HTML, not spawning processes.
+_WEB_TOOLS = ("write_file", "read_file", "list_files")
+def _registry(workspace: Workspace, names, collector: TraceCollector | None = None) -> lf.ToolRegistry:
+    reg = lf.ToolRegistry()
+    for name, tool in zip(_TOOL_ORDER, _tools(workspace, collector)):
+        if name in names:
+            reg.register(tool)
+    return reg
+def build_registry(workspace: Workspace, collector: TraceCollector | None = None) -> lf.ToolRegistry:
+    """Return a ToolRegistry of all coding tools bound to `workspace`."""
+    return _registry(workspace, _TOOL_ORDER, collector)
+def build_web_registry(workspace: Workspace, collector: TraceCollector | None = None) -> lf.ToolRegistry:
+    """Return the smolbuilder web agent's tools: file ops + a headless app check."""
+    reg = _registry(workspace, _WEB_TOOLS, collector)
+    reg.register(_check_app_tool(workspace, collector))
+    return reg
+def check_app_impl(ws: Workspace, collector: TraceCollector | None, args: dict) -> dict:
+    """Run check_app logic (shared by LiteForge tool and Rust python callback)."""
+    if not any(f == "index.html" for f in ws.list_files()):
+        return {"ok": False,
+                "errors": ["index.html not found: create it first with write_file."]}
+    files = {}
+    for rel in ws.list_files():
+        r = ws.read_file(rel)
+        if r.get("ok"):
+            files[rel] = r["content"]
+    ok, errors = browsercheck.check_html(inline_app(files))
+    if ok is None:
+        return {"ok": True, "errors": [],
+                "note": "runtime check unavailable here; assuming ok"}
+    if ok:
+        return {"ok": True, "errors": [],
+                "message": "The app loads and every button works."}
+    return {"ok": False, "errors": errors,
+            "hint": "Fix these JavaScript errors in index.html, then call check_app again."}
+def _check_app_tool(ws: Workspace, collector: TraceCollector | None = None):
+    """A `check_app` tool: actually run the built app and report JS errors."""
+    def check_app(args: dict) -> dict:
+        return check_app_impl(ws, collector, args)
+    check_app = _wrap("check_app", check_app, collector)
+    return lf.create_tool(
+        "check_app",
+        "Run the current web app in a headless browser: load index.html, execute "
+        "its JavaScript, click every button, and report any errors. Use this to "
+        "verify the app actually works before finishing.",
+        {"type": "object", "properties": {}},
+        check_app,
+    )
+def _tools(ws: Workspace, collector: TraceCollector | None = None) -> list:
+    def write_file(args: dict) -> dict:
+        return ws.write_file(args["path"], args.get("content", ""))
+    def read_file(args: dict) -> dict:
+        return ws.read_file(args["path"])
+    def list_files(args: dict) -> dict:
+        return {"ok": True, "files": ws.list_files()}
+    def run_python(args: dict) -> dict:
+        return ws.run_python(path=args["path"]).as_tool_payload()
+    def run_tests(args: dict) -> dict:
+        return ws.run_tests().as_tool_payload()
+    write_file = _wrap("write_file", write_file, collector)
+    read_file = _wrap("read_file", read_file, collector)
+    list_files = _wrap("list_files", list_files, collector)
+    run_python = _wrap("run_python", run_python, collector)
+    run_tests = _wrap("run_tests", run_tests, collector)
+    return [
+        lf.create_tool(
+            "write_file",
+            "Create or overwrite a file in the workspace with the given text content.",
+            {
+                "type": "object",
+                "properties": {
+                    "path": {"type": "string", "description": "Relative path, e.g. main.py"},
+                    "content": {"type": "string", "description": "Full file contents"},
+                },
+                "required": ["path", "content"],
+            },
+            write_file,
+        ),
+        lf.create_tool(
+            "read_file",
+            "Read a file from the workspace and return its contents.",
+            {
+                "type": "object",
+                "properties": {"path": {"type": "string"}},
+                "required": ["path"],
+            },
+            read_file,
+        ),
+        lf.create_tool(
+            "list_files",
+            "List all files currently in the workspace.",
+            {"type": "object", "properties": {}},
+            list_files,
+        ),
+        lf.create_tool(
+            "run_python",
+            "Run a Python file in the workspace. Returns stdout, stderr and exit code.",
+            {
+                "type": "object",
+                "properties": {"path": {"type": "string", "description": "File to run, e.g. main.py"}},
+                "required": ["path"],
+            },
+            run_python,
+        ),
+        lf.create_tool(
+            "run_tests",
+            "Run the test suite (pytest) in the workspace. Returns pass/fail output.",
+            {"type": "object", "properties": {}},
+            run_tests,
+        ),
+    ]

engine/trace.py ADDED Viewed

	@@ -0,0 +1,73 @@

+"""Shareable agent traces (Build Small "Sharing is Caring" badge).
+Turns a completed smolcode run into an OpenTelemetry-style JSON trace: a root
+span minted by LiteForge's `Tracer` plus one child span per agent step, carrying
+the step kind, duration, and token counts read from `AgentStep`. Publish a trace
+file to the Hub so others can see exactly how the tiny model reasoned.
+"""
+from __future__ import annotations
+import json
+import time
+from pathlib import Path
+import liteforge as lf
+def build_trace(agent, task: str, final: str, *, preset: str, model: str) -> dict:
+    """Build an OTel-ish trace document from a finished agent run."""
+    tracer = lf.Tracer("smolcode")
+    root = tracer.start_span("coding_task")
+    root.set_attribute("preset", preset)
+    root.set_attribute("model", model)
+    root.set_attribute("task", task)
+    trace_id = root.context.trace_id
+    root_id = root.context.span_id
+    spans: list[dict] = []
+    total_tokens = 0
+    history = agent.raw_history() if hasattr(agent, "raw_history") else getattr(agent, "history", lambda: [])()
+    for i, s in enumerate(history):
+        dur = getattr(s, "duration_ms", None) or 0
+        tot = getattr(s, "total_tokens", None) or 0
+        step_no = getattr(s, "step_number", getattr(s, "number", i))
+        step_type = getattr(s, "step_type", getattr(s, "kind", "step"))
+        result_text = getattr(s, "result", getattr(s, "detail", ""))
+        total_tokens += tot or 0
+        spans.append({
+            "trace_id": trace_id,
+            "span_id": f"{root_id[:-len(str(step_no))-1]}{step_no:02d}",
+            "parent_span_id": root_id,
+            "name": str(step_type),
+            "duration_ms": dur,
+            "attributes": {
+                "step_number": step_no,
+                "prompt_tokens": getattr(s, "prompt_tokens", None),
+                "completion_tokens": getattr(s, "completion_tokens", None),
+                "total_tokens": tot,
+                "result": str(result_text)[:200],
+            },
+        })
+    root.end()
+    return {
+        "trace_id": trace_id,
+        "service": "smolcode",
+        "preset": preset,
+        "model": model,
+        "task": task,
+        "final": final,
+        "n_steps": len(spans),
+        "total_tokens": total_tokens,
+        "root": {"span_id": root_id, "name": "coding_task"},
+        "spans": spans,
+    }
+def save_trace(trace: dict, out_dir: str | Path = "traces") -> Path:
+    d = Path(out_dir)
+    d.mkdir(parents=True, exist_ok=True)
+    stamp = time.strftime("%Y%m%d-%H%M%S")
+    path = d / f"trace-{stamp}.json"
+    path.write_text(json.dumps(trace, indent=2))
+    return path

engine/trace_collector.py ADDED Viewed

	@@ -0,0 +1,128 @@

+"""Append-only trace event log for live UI updates.
+Tool call args/results are captured by wrapping LiteForge tool callables.
+LiteForge's agent history only exposes step kinds, not tool I/O.
+"""
+from __future__ import annotations
+import json
+import re
+from dataclasses import dataclass, field
+from typing import Any, Literal
+TraceKind = Literal["tool_call", "tool_result", "tier_escalation", "final", "error"]
+_REDACTED = "[REDACTED]"
+_PREFIXES = ("sk-", "ghp_", "gho_", "ghs_", "github_pat_", "xoxb-", "xoxp-", "AKIA", "AIza", "glpat-")
+_SENSITIVE_KEYS = (
+    "api_key", "apikey", "token", "secret", "password", "passwd",
+    "access_key", "client_secret", "private_key",
+)
+@dataclass
+class TraceEvent:
+    kind: TraceKind
+    name: str
+    detail: str
+    step: int | None = None
+    duration_ms: int | None = None
+    tokens: int | None = None
+@dataclass
+class TraceCollector:
+    """Thread-safe enough for asyncio single-task agent runs."""
+    events: list[TraceEvent] = field(default_factory=list)
+    _tool_step: int = 0
+    def record(self, kind: TraceKind, name: str, detail: str, **meta) -> None:
+        self.events.append(TraceEvent(kind=kind, name=name, detail=detail, **meta))
+    def record_tool_call(self, name: str, args: dict[str, Any]) -> None:
+        self.record("tool_call", name, _format_payload(args), step=self._tool_step)
+    def record_tool_result(self, name: str, result: dict[str, Any]) -> None:
+        self.record("tool_result", name, _format_payload(result), step=self._tool_step)
+        self._tool_step += 1
+    def record_escalation(self, from_tier: str, to_tier: str) -> None:
+        self.record("tier_escalation", to_tier, f"escalated from {from_tier}")
+    def record_final(self, text: str) -> None:
+        self.record("final", "response", redact(text))
+    def record_error(self, text: str) -> None:
+        self.record("error", "error", redact(text))
+    def snapshot(self) -> list[TraceEvent]:
+        return list(self.events)
+def redact(text: str) -> str:
+    """Conservative secret redaction for UI display."""
+    lines = []
+    for line in text.splitlines(keepends=True):
+        content, nl = (line[:-1], "\n") if line.endswith("\n") else (line, "")
+        lines.append(_redact_line(content) + nl)
+    return "".join(lines)
+def _redact_line(line: str) -> str:
+    out: list[str] = []
+    i = 0
+    while i < len(line):
+        ch = line[i]
+        if ch in "\"'`" or ch.isalnum() or ch in "_-":
+            j = i
+            while j < len(line) and not line[j].isspace() and line[j] not in ",;)]}":
+                j += 1
+            token = line[i:j]
+            if _looks_secret(token):
+                out.append(_REDACTED)
+            else:
+                out.append(token)
+            i = j
+            continue
+        if ch == "=" and i + 1 < len(line):
+            key_start = i
+            while key_start > 0 and (line[key_start - 1].isalnum() or line[key_start - 1] in "_-"):
+                key_start -= 1
+            key = line[key_start:i].lower()
+            if any(s in key for s in _SENSITIVE_KEYS):
+                out.append(line[i : i + 1])
+                i += 1
+                j = i
+                while j < len(line) and not line[j].isspace():
+                    j += 1
+                out.append(_REDACTED)
+                i = j
+                continue
+        out.append(ch)
+        i += 1
+    return "".join(out)
+def _looks_secret(token: str) -> bool:
+    for prefix in _PREFIXES:
+        if token.startswith(prefix) and len(token) >= len(prefix) + 8:
+            return True
+    if len(token) >= 32 and re.fullmatch(r"[A-Za-z0-9_\-+/=]+", token):
+        upper = sum(1 for c in token if c.isupper())
+        lower = sum(1 for c in token if c.islower())
+        digit = sum(1 for c in token if c.isdigit())
+        if upper >= 4 and lower >= 4 and digit >= 2:
+            return True
+    return False
+def _format_payload(data: dict[str, Any], *, max_content: int = 600) -> str:
+    """JSON-format tool args/results, truncating large file content."""
+    out = dict(data)
+    if "content" in out and isinstance(out["content"], str):
+        text = out["content"]
+        if len(text) > max_content:
+            out["content"] = text[:max_content] + f"\n… ({len(text)} chars total)"
+    raw = json.dumps(out, indent=2, ensure_ascii=False)
+    return redact(raw)

engine/ui_trace.py ADDED Viewed

	@@ -0,0 +1,121 @@

+"""Trace rendering for the Gradio web UI."""
+from __future__ import annotations
+from .agent import Step
+from .trace_collector import TraceEvent
+_TOOL_ICON = {
+    "write_file": "✏️", "read_file": "📖", "list_files": "📂",
+    "run_python": "▶️", "run_tests": "🧪", "check_app": "🌐",
+}
+def merge_step_metadata(events: list[TraceEvent], raw_history: list) -> list[TraceEvent]:
+    """Attach LiteForge timing/token stats to tool_call events."""
+    if not raw_history:
+        return events
+    calls = [e for e in events if e.kind == "tool_call"]
+    merged: list[TraceEvent] = []
+    call_idx = 0
+    for ev in events:
+        if ev.kind != "tool_call" or call_idx >= len(raw_history):
+            merged.append(ev)
+            continue
+        step = raw_history[call_idx]
+        call_idx += 1
+        merged.append(TraceEvent(
+            kind=ev.kind, name=ev.name, detail=ev.detail, step=ev.step,
+            duration_ms=getattr(step, "duration_ms", None),
+            tokens=getattr(step, "total_tokens", None),
+        ))
+    return merged
+def format_trace_md(
+    events: list[TraceEvent],
+    *,
+    steps: list[Step] | None = None,
+    max_detail: int = 500,
+    idle: str = "_waiting for the model…_",
+) -> str:
+    """Render trace events as markdown with expandable tool I/O."""
+    if not events and not steps:
+        return idle
+    if not events and steps:
+        return _steps_only_md(steps)
+    lines: list[str] = []
+    step_no = 0
+    i = 0
+    while i < len(events):
+        ev = events[i]
+        if ev.kind == "tool_call":
+            icon = _TOOL_ICON.get(ev.name, "🔧")
+            meta = _meta_badge(ev)
+            summary = f"`{step_no}` &nbsp; {icon} **{ev.name}**{meta}"
+            detail = _truncate(ev.detail, max_detail)
+            block = f"<details><summary>{summary}</summary>\n\n```json\n{detail}\n```\n</details>"
+            if i + 1 < len(events) and events[i + 1].kind == "tool_result":
+                result = _truncate(events[i + 1].detail, max_detail)
+                block += f"\n\n↳ result:\n\n```json\n{result}\n```"
+                i += 1
+            lines.append(block)
+            step_no += 1
+        elif ev.kind == "tier_escalation":
+            lines.append(f"⬆️ **escalated** → `{ev.name}`: {ev.detail}")
+        elif ev.kind == "final":
+            lines.append("✅ **final answer**")
+        elif ev.kind == "error":
+            lines.append(f"⚠️ **error**: {_truncate(ev.detail, max_detail)}")
+        i += 1
+    return "\n\n".join(lines) if lines else idle
+def format_fanout_trace_md(results) -> str:
+    """Per-subagent expandable traces for fan-out mode."""
+    if not results:
+        return "_no subagents_"
+    blocks = []
+    for r in results:
+        events = getattr(r, "trace_events", None) or []
+        inner = format_trace_md(events, steps=r.steps, idle="_no steps yet_")
+        verdict = "✓ verified" if r.verified else ("⚠️ error" if r.error else "· unverified")
+        blocks.append(
+            f"<details><summary>`{r.index + 1}` **subagent** ({r.model}): "
+            f"{len(r.steps)} steps · {verdict}</summary>\n\n{inner}\n</details>"
+        )
+    return "\n\n".join(blocks)
+def _steps_only_md(steps: list[Step]) -> str:
+    lines = []
+    for s in steps:
+        kind = s.kind
+        if kind.startswith("tool_call:"):
+            tool = kind.split(":", 1)[1]
+            icon = _TOOL_ICON.get(tool, "🔧")
+            meta = ""
+            if s.total_tokens:
+                meta = f" · {s.total_tokens} tok"
+            lines.append(f"`{s.number}` &nbsp; {icon} **{tool}**{meta}")
+        elif kind == "response":
+            lines.append("✅ **final answer**")
+        else:
+            lines.append(f"• {kind}")
+    return "\n\n".join(lines) if lines else "_waiting for the model…_"
+def _meta_badge(ev: TraceEvent) -> str:
+    parts = []
+    if ev.duration_ms is not None:
+        parts.append(f"{ev.duration_ms}ms")
+    if ev.tokens is not None:
+        parts.append(f"{ev.tokens} tok")
+    return f" <span class='trace-meta'>({', '.join(parts)})</span>" if parts else ""
+def _truncate(text: str, limit: int) -> str:
+    text = text.strip()
+    if len(text) <= limit:
+        return text
+    return text[:limit] + f"\n… ({len(text)} chars total)"

engine/web_tui.py ADDED Viewed

	@@ -0,0 +1,471 @@

+"""CLI-shaped web UI: transcript buffer, HTML rendering, layout helpers."""
+from __future__ import annotations
+import html
+from dataclasses import dataclass, field
+from .gradio_shell import UiSettings
+from .rust_session import list_commands
+from .themes import theme_at, theme_names
+_BUILTIN_SLASH = [
+    "/help", "/mode", "/think", "/mcp", "/rules", "/skills", "/skill", "/bg",
+    "/init", "/new", "/sessions", "/rename", "/fork", "/delete", "/timeline",
+    "/stats", "/export", "/search", "/config", "/commit", "/agents", "/models",
+    "/themes", "/files", "/clear", "/quit",
+]
+_KIND_STYLE = {
+    "user": ("›", "#e2e8f0", "#1e293b"),
+    "assistant": ("◆", "#c4b5fd", "#1e1b4b"),
+    "tool": ("⚙", "#a78bfa", "#0f172a"),
+    "result": ("·", "#94a3b8", "#0f172a"),
+    "info": ("·", "#94a3b8", "#0f172a"),
+    "error": ("✕", "#f87171", "#450a0a"),
+    "final": ("✓", "#34d399", "#052e16"),
+}
+@dataclass
+class TranscriptLine:
+    kind: str
+    text: str
+@dataclass
+class Transcript:
+    lines: list[TranscriptLine] = field(default_factory=list)
+    partial: str = ""
+    def clear(self) -> None:
+        self.lines.clear()
+        self.partial = ""
+    def append(self, kind: str, text: str) -> None:
+        text = (text or "").strip()
+        if not text:
+            return
+        self.lines.append(TranscriptLine(kind=kind, text=text))
+    def append_user(self, text: str) -> None:
+        self.append("user", text)
+    def append_assistant(self, text: str) -> None:
+        self.append("assistant", text)
+    def append_info(self, text: str) -> None:
+        self.append("info", text)
+    def append_error(self, text: str) -> None:
+        self.append("error", text)
+    def append_tool_call(self, name: str, args: str) -> None:
+        self.append("tool", f"{name} {args[:200]}")
+    def append_tool_result(self, name: str, text: str) -> None:
+        clipped = text[:400] + ("…" if len(text) > 400 else "")
+        self.append("result", f"{name}: {clipped}")
+    def set_partial(self, text: str) -> None:
+        self.partial = text
+    def from_stored_chat(self, stored: list[dict[str, str]]) -> None:
+        self.clear()
+        for m in stored:
+            role = m.get("role", "assistant")
+            kind = "user" if role == "user" else "assistant"
+            self.append(kind, m.get("text", ""))
+    def append_final(self, text: str) -> None:
+        self.append("final", text)
+    def plain_texts(self) -> list[str]:
+        return [ln.text for ln in self.lines]
+    def search(self, query: str, limit: int = 20) -> list[str]:
+        if not query.strip():
+            return []
+        q = query.lower()
+        hits: list[str] = []
+        for ln in self.lines:
+            if q in ln.text.lower():
+                hits.append(f"[{ln.kind}] {ln.text[:120]}")
+            if len(hits) >= limit:
+                break
+        return hits
+    def render_html(self, *, running: bool = False) -> str:
+        if not self.lines and not self.partial and not running:
+            return (
+                '<div class="sc-transcript-wrap">'
+                '<div class="sc-transcript-empty">'
+                "smolcode — describe a coding task, or type <code>/help</code>"
+                "</div></div>"
+            )
+        parts: list[str] = ['<div class="sc-transcript-inner">']
+        for ln in self.lines:
+            parts.append(_line_html(ln.kind, ln.text))
+        if self.partial:
+            parts.append(_line_html("assistant", self.partial + "▏"))
+        if running and not self.partial:
+            parts.append('<div class="sc-tline sc-tline-info">· thinking…</div>')
+        parts.append("</div>")
+        return f'<div class="sc-transcript-wrap">\n' + "\n".join(parts) + "\n</div>"
+def _line_html(kind: str, text: str) -> str:
+    glyph, color, _bg = _KIND_STYLE.get(kind, _KIND_STYLE["info"])
+    body = html.escape(text).replace("\n", "<br>")
+    return (
+        f'<div class="sc-tline sc-tline-{kind}">'
+        f'<span class="sc-tglyph" style="color:{color}">{glyph}</span> '
+        f'<span class="sc-ttext">{body}</span></div>'
+    )
+def slash_commands(workspace: str) -> list[str]:
+    custom = [f"/{n}" for n in list_commands(workspace)]
+    return _BUILTIN_SLASH + custom
+def filter_slash_commands(prefix: str, workspace: str) -> list[str]:
+    p = prefix if prefix.startswith("/") else f"/{prefix}"
+    return [c for c in slash_commands(workspace) if c.startswith(p)]
+def header_bar_html(
+    *,
+    git_branch: str = "",
+    git_dirty: bool = False,
+    model: str = "",
+    host: str = "",
+    theme: str = "default",
+) -> str:
+    git_part = ""
+    if git_branch:
+        dirty = " ●" if git_dirty else ""
+        git_part = f'<span class="sc-hgit">⎇ {html.escape(git_branch)}{dirty}</span>'
+    model_part = html.escape(model) if model else "—"
+    host_part = html.escape(host) if host else ""
+    return (
+        '<div class="sc-header-bar">'
+        f'<span class="sc-hbrand">◆ smol<span class="hf-accent">code</span></span>'
+        f"{git_part}"
+        f'<span class="sc-hmodel">{model_part}</span>'
+        f'<span class="sc-hhost">@ {host_part}</span>'
+        f'<span class="sc-htheme">{html.escape(theme)}</span>'
+        "</div>"
+    )
+def status_bar_html(
+    settings: UiSettings,
+    *,
+    session_title: str = "new session",
+    model: str = "",
+    running: bool = False,
+) -> str:
+    mode = settings.mode.upper()
+    if settings.mode == "auto":
+        mode = "AUTO"
+    elif settings.mode == "plan":
+        mode = "PLAN"
+    else:
+        mode = "EDIT"
+    think = ""
+    if settings.think and settings.think != "off":
+        think = f'<span class="sc-chip sc-chip-think">think:{settings.think}</span>'
+    run = '<span class="sc-chip sc-chip-run">running</span>' if running else ""
+    ws = html.escape(settings.workspace[:48])
+    sess = html.escape(session_title[:32])
+    ag = html.escape(settings.agent)
+    mdl = html.escape(model or settings.model or "—")
+    return (
+        '<div class="sc-status-bar">'
+        f'<span class="sc-chip sc-chip-brand">smolcode</span>'
+        f'<span class="sc-chip">{sess}</span>'
+        f'<span class="sc-chip sc-chip-dim">{ws}</span>'
+        f'<button type="button" class="sc-chip sc-chip-clickable" data-picker="agents">{ag}</button>'
+        f'<button type="button" class="sc-chip sc-chip-clickable sc-chip-mode" data-action="cycle-mode">{mode}</button>'
+        f"{think}{run}"
+        f'<button type="button" class="sc-chip sc-chip-clickable sc-chip-model" data-picker="models">{mdl}</button>'
+        f'<button type="button" class="sc-chip sc-chip-clickable sc-chip-dim" data-picker="themes">theme</button>'
+        "</div>"
+    )
+def parse_git_header(git_text: str) -> tuple[str, bool]:
+    branch = ""
+    dirty = False
+    for line in git_text.splitlines():
+        if line.startswith("##"):
+            branch = line[2:].strip().split("...")[0]
+        if line.strip() and not line.startswith("#"):
+            dirty = True
+    return branch, dirty
+def host_from_url(base_url: str) -> str:
+    u = base_url.strip()
+    for prefix in ("https://", "http://"):
+        if u.startswith(prefix):
+            u = u[len(prefix):]
+    return u.split("/")[0] if u else ""
+def cycle_mode(current: str) -> str:
+    order = ["normal", "auto", "plan"]
+    try:
+        i = order.index(current)
+    except ValueError:
+        return "normal"
+    return order[(i + 1) % len(order)]
+def cycle_think(current: str) -> str:
+    order = ["off", "low", "high", "xtra"]
+    try:
+        i = order.index(current)
+    except ValueError:
+        return "off"
+    return order[(i + 1) % len(order)]
+def cycle_agent(current: str) -> str:
+    order = ["build", "plan"]
+    try:
+        i = order.index(current)
+    except ValueError:
+        return "build"
+    return order[(i + 1) % len(order)]
+def cycle_model(models: list[str], current: str) -> str:
+    if not models:
+        return current
+    try:
+        i = models.index(current)
+    except ValueError:
+        return models[0]
+    return models[(i + 1) % len(models)]
+def ingest_agent_event(transcript: Transcript, ev: dict) -> None:
+    kind = ev.get("kind")
+    if kind == "token":
+        transcript.set_partial(transcript.partial + ev.get("text", ""))
+    elif kind == "assistant":
+        transcript.set_partial(ev.get("text", ""))
+    elif kind == "tool_call":
+        transcript.set_partial("")
+        transcript.append_tool_call(ev.get("name", ""), ev.get("args", ""))
+    elif kind == "tool_result":
+        transcript.append_tool_result(ev.get("name", ""), ev.get("text", ""))
+    elif kind == "final":
+        transcript.set_partial("")
+        transcript.append_final(ev.get("text", ""))
+    elif kind == "error":
+        transcript.set_partial("")
+        transcript.append_error(ev.get("text", ""))
+def help_overlay_html() -> str:
+    lines = [
+        "Enter — run task",
+        "Shift+Enter — newline",
+        "/ — slash commands (Tab complete)",
+        "@ — attach file",
+        "! cmd — shell (no LLM)",
+        "Ctrl+L — clear transcript",
+        "Ctrl+X — leader key menu",
+        "Tab — cycle agent",
+        "Shift+Tab — cycle mode",
+        "F2 — cycle model",
+        "Esc — interrupt / close overlay",
+    ]
+    body = "<br>".join(html.escape(ln) for ln in lines)
+    return f'<div class="sc-overlay-body"><b>smolcode keys</b><br><br>{body}</div>'
+def whichkey_overlay_html() -> str:
+    lines = [
+        "m models", "a agents", "t themes", "l sessions",
+        "n new session", "b sidebar", "s stats/files", "f focus files",
+        "h help", "o mode", "e think", "q quit",
+    ]
+    body = "<br>".join(html.escape(ln) for ln in lines)
+    return f'<div class="sc-overlay-body"><b>ctrl+x leader</b><br><br>{body}</div>'
+def render_picker_html(
+    kind: str,
+    items: list[str],
+    selected: int,
+    *,
+    title: str | None = None,
+) -> str:
+    """TUI-style bordered picker list with scroll window."""
+    label = title or kind
+    if not items:
+        return (
+            f'<div class="sc-picker" data-kind="{html.escape(kind)}">'
+            f'<div class="sc-picker-title">{html.escape(label)}</div>'
+            '<div class="sc-picker-empty">(empty)</div></div>'
+        )
+    win = 12
+    sel = min(max(0, selected), len(items) - 1)
+    start = max(0, sel - win // 2)
+    end = min(len(items), start + win)
+    start = max(0, end - win)
+    rows: list[str] = []
+    for i in range(start, end):
+        item = items[i]
+        marker = "❯" if i == sel else " "
+        cls = "sc-picker-item sc-picker-sel" if i == sel else "sc-picker-item"
+        rows.append(
+            f'<button type="button" class="{cls}" data-idx="{i}" '
+            f'onclick="window.__smolcodePick && window.__smolcodePick({i})">'
+            f'<span class="sc-picker-mark">{marker}</span>'
+            f"<span>{html.escape(item)}</span></button>"
+        )
+    body = "\n".join(rows)
+    return (
+        f'<div class="sc-picker" data-kind="{html.escape(kind)}">'
+        f'<div class="sc-picker-title">{html.escape(label)}</div>'
+        f'<div class="sc-picker-list">{body}</div>'
+        f'<div class="sc-picker-hint">↑↓ navigate · Enter select · Esc close</div>'
+        f"</div>"
+    )
+def shell_theme_html(theme_idx: int) -> str:
+    """Inject data-theme on the TUI shell wrapper."""
+    name = theme_at(theme_idx).name
+    safe = html.escape(name, quote=True)
+    return (
+        f'<script>(function(){{var el=document.querySelector(".sc-tui-shell");'
+        f'if(el)el.setAttribute("data-theme","{safe}");}})();</script>'
+    )
+def agent_choices() -> list[str]:
+    return ["build", "plan"]
+def theme_picker_items() -> list[str]:
+    return theme_names()
+def _sorted_file_paths(files: dict[str, str] | list[str]) -> list[str]:
+    if isinstance(files, dict):
+        return sorted(files.keys())
+    return sorted(files)
+def _paths_for_ui(files: dict[str, str] | list[str] | None) -> list[str]:
+    return _sorted_file_paths(files or [])
+def _files_sidebar_body(paths: list[str], *, selected: int = 0, max_rows: int = 48) -> str:
+    """Flat file list grouped by directory, matching the CLI TUI sidebar."""
+    if not paths:
+        return '<div class="sc-sb-empty">no files</div>'
+    rows: list[str] = []
+    sel_row: int | None = None
+    last_dir = ""
+    sel = min(selected, max(0, len(paths) - 1))
+    for i, path in enumerate(paths):
+        if "/" in path:
+            j = path.rfind("/")
+            dir_part, file_part = path[:j], path[j + 1 :]
+        else:
+            dir_part, file_part = "", path
+        if dir_part != last_dir:
+            last_dir = dir_part
+            label = "." if not dir_part else f"{dir_part}/"
+            rows.append(f'<div class="sc-sb-dir">▾ {html.escape(label)}</div>')
+        is_sel = i == sel
+        if is_sel:
+            sel_row = len(rows)
+        prefix = "❯" if is_sel else ""
+        cls = "sc-sb-file sc-sb-sel" if is_sel else "sc-sb-file"
+        rows.append(
+            f'<div class="{cls}">'
+            f'<span class="sc-sb-mark">{prefix}</span>'
+            f'<span class="sc-sb-glyph"> </span>'
+            f'<span class="sc-sb-name">{html.escape(file_part)}</span>'
+            f"</div>"
+        )
+    total = len(rows)
+    start = 0
+    if total > max_rows:
+        anchor = sel_row if sel_row is not None else 0
+        start = min(max(0, anchor - max_rows + 1), total - max_rows)
+    visible = rows[start : start + max_rows]
+    if total > max_rows and start + max_rows < total:
+        more = total - (start + max_rows) + 1
+        visible.append(f'<div class="sc-sb-more">… +{more} more</div>')
+    return "\n".join(visible)
+def _stats_sidebar_body(
+    *,
+    session_id: str,
+    file_count: int,
+    agent: str,
+    extra_lines: list[str] | None = None,
+) -> str:
+    parts = [
+        f'<div class="sc-sb-stat sc-sb-dim">{html.escape(session_id[:26])}</div>',
+        '<div class="sc-sb-stat"></div>',
+    ]
+    for line in extra_lines or []:
+        parts.append(f'<div class="sc-sb-stat">{html.escape(line)}</div>')
+    parts.append(f'<div class="sc-sb-stat">files: {file_count}</div>')
+    parts.append(f'<div class="sc-sb-stat">agent: {html.escape(agent)}</div>')
+    return "\n".join(parts)
+def render_sidebar_html(
+    *,
+    view: str = "files",
+    files: dict[str, str] | list[str] | None = None,
+    selected: int = 0,
+    focused: bool = False,
+    session_id: str = "(none)",
+    agent: str = "build",
+    stats_lines: list[str] | None = None,
+    file_total: int | None = None,
+) -> str:
+    """CLI TUI-shaped sidebar panel (flat file list or stats)."""
+    paths = _paths_for_ui(files)
+    total = file_total if file_total is not None else len(paths)
+    title = "stats" if view == "stats" else ("files ▸" if focused else "files")
+    panel_cls = "sc-sidebar-panel"
+    if focused:
+        panel_cls += " sc-sidebar-focused"
+    if view == "stats":
+        body = _stats_sidebar_body(
+            session_id=session_id,
+            file_count=total,
+            agent=agent,
+            extra_lines=stats_lines,
+        )
+    else:
+        body = _files_sidebar_body(paths, selected=selected)
+        if total > len(paths):
+            body += f'\n<div class="sc-sb-more">… {total - len(paths)} more files</div>'
+    return (
+        f'<div class="{panel_cls}">'
+        f'<div class="sc-sidebar-title">{html.escape(title)}</div>'
+        f'<div class="sc-sidebar-body">{body}</div>'
+        f"</div>"
+    )

engine/webcheck.js ADDED Viewed

	@@ -0,0 +1,108 @@

+// Headless smoke-check for a model-built web app, used by smolbuilder so the
+// agent can actually *test* what it builds (the web equivalent of run_python).
+//
+// Loads index.html in jsdom, runs its scripts, then clicks every <button>, and
+// reports any JavaScript errors. The goal is high precision: a correct app
+// reports zero errors; a broken one (null element refs, undefined functions,
+// syntax errors, exceptions on click) reports them so the agent can fix it.
+//
+// We stub the browser APIs jsdom doesn't implement (canvas 2d/webgl context,
+// alert/confirm/prompt, matchMedia, media play) so apps that *use* them aren't
+// falsely flagged — we're checking the app's own logic, not jsdom's coverage.
+//
+// Output: a single JSON line {ok, errors, buttons, clicked}. Exit 0 always
+// (the verdict is in the JSON); exit 3 only if jsdom itself is missing.
+'use strict';
+let JSDOM, VirtualConsole;
+try {
+  ({ JSDOM, VirtualConsole } = require('jsdom'));
+} catch (e) {
+  process.stdout.write(JSON.stringify({ ok: null, infra: 'jsdom not installed' }) + '\n');
+  process.exit(3);
+}
+const fs = require('fs');
+function makeCtx() {
+  // A permissive 2d/webgl context stub: method calls no-op, the few methods
+  // whose *return value* is used hand back something safe to deref.
+  return new Proxy({}, {
+    get(_t, p) {
+      if (p === 'measureText') return () => ({ width: 0 });
+      if (p === 'getImageData') return () => ({ data: new Uint8ClampedArray(4), width: 1, height: 1 });
+      if (p === 'createLinearGradient' || p === 'createRadialGradient' || p === 'createPattern')
+        return () => ({ addColorStop() {} });
+      if (p === 'canvas') return { width: 300, height: 150 };
+      return () => undefined;
+    },
+    set() { return true; },
+  });
+}
+function stubBrowser(window) {
+  try { window.HTMLCanvasElement.prototype.getContext = () => makeCtx(); } catch (e) {}
+  const noop = () => {};
+  window.alert = noop;
+  window.confirm = () => true;
+  window.prompt = () => '';
+  window.scrollTo = noop;
+  window.scroll = noop;
+  if (!window.matchMedia)
+    window.matchMedia = () => ({ matches: false, media: '', addListener: noop, removeListener: noop, addEventListener: noop, removeEventListener: noop });
+  try { window.HTMLMediaElement.prototype.play = () => Promise.resolve(); } catch (e) {}
+  try { window.HTMLMediaElement.prototype.pause = noop; } catch (e) {}
+}
+const file = process.argv[2];
+const html = fs.readFileSync(file, 'utf8');
+const errors = [];
+const push = (m) => { if (m && errors.indexOf(m) === -1) errors.push(String(m).slice(0, 400)); };
+const vc = new VirtualConsole();
+vc.on('jsdomError', (e) => push('script error: ' + (e && e.detail ? (e.detail.message || e.detail) : (e && e.message))));
+let dom;
+try {
+  dom = new JSDOM(html, {
+    runScripts: 'dangerously',
+    pretendToBeVisual: true,
+    virtualConsole: vc,
+    beforeParse(window) {
+      stubBrowser(window);
+      window.addEventListener('error', (ev) => push('uncaught: ' + (ev.error ? (ev.error.message || ev.error) : ev.message)));
+      window.addEventListener('unhandledrejection', (ev) => push('promise rejection: ' + (ev.reason && ev.reason.message ? ev.reason.message : ev.reason)));
+    },
+  });
+} catch (e) {
+  push('load failed: ' + e.message);
+  process.stdout.write(JSON.stringify({ ok: false, errors, buttons: 0, clicked: 0 }) + '\n');
+  process.exit(0);
+}
+const { window } = dom;
+const doc = window.document;
+function clickAll() {
+  const buttons = Array.from(doc.querySelectorAll('button, [onclick], input[type=button], input[type=submit]'));
+  let clicked = 0;
+  for (const el of buttons) {
+    try {
+      if (el.disabled) el.disabled = false; // exercise the handler regardless of initial state
+      el.click();
+      clicked++;
+    } catch (e) {
+      push('click "' + (el.textContent || el.id || el.tagName).trim().slice(0, 30) + '": ' + e.message);
+    }
+  }
+  return { n: buttons.length, clicked };
+}
+// Let inline scripts settle, click, then let one timer tick surface late errors.
+setTimeout(() => {
+  const { n, clicked } = clickAll();
+  setTimeout(() => {
+    process.stdout.write(JSON.stringify({ ok: errors.length === 0, errors, buttons: n, clicked }) + '\n');
+    process.exit(0);
+  }, 250);
+}, 50);

engine/webcheck.py ADDED Viewed

	@@ -0,0 +1,65 @@

+"""Headless verification of model-built web apps (the web `run_python`).
+smolbuilder's agent writes HTML/CSS/JS but, unlike the Python path, had no way
+to *run* it — so it shipped broken apps and couldn't tell. This bridges to a
+small Node + jsdom checker (engine/webcheck.js) that loads the page, runs its
+scripts, clicks every button, and reports JavaScript errors.
+Graceful degradation is deliberate: if Node or jsdom isn't available (e.g. a
+minimal Space image), we return `None` ("unverifiable") rather than failing the
+build — the agent/router fall back to the structural check.
+"""
+from __future__ import annotations
+import json
+import shutil
+import subprocess
+import tempfile
+from pathlib import Path
+_CHECKER = Path(__file__).with_name("webcheck.js")
+def available() -> bool:
+    """True if we can actually run the headless check (Node present)."""
+    return shutil.which("node") is not None and _CHECKER.exists()
+def check_html(html: str, timeout: int = 20) -> tuple[bool | None, list[str]]:
+    """Run the headless check on an HTML document.
+    Returns (ok, errors):
+      - (True, [])        the app loaded and all buttons clicked without error
+      - (False, [...])    real JavaScript errors were found
+      - (None, [...])     unverifiable (Node/jsdom missing, or the checker broke)
+    """
+    node = shutil.which("node")
+    if not node or not _CHECKER.exists():
+        return None, ["node/jsdom unavailable (skipped runtime check)"]
+    with tempfile.NamedTemporaryFile("w", suffix=".html", delete=False) as f:
+        f.write(html)
+        path = f.name
+    try:
+        proc = subprocess.run(
+            [node, str(_CHECKER), path],
+            capture_output=True, text=True, timeout=timeout,
+        )
+    except subprocess.TimeoutExpired:
+        return None, [f"runtime check timed out after {timeout}s"]
+    finally:
+        Path(path).unlink(missing_ok=True)
+    if proc.returncode == 3:        # jsdom not installed
+        return None, ["jsdom not installed (skipped runtime check)"]
+    line = (proc.stdout or "").strip().splitlines()
+    if not line:
+        return None, [f"runtime check produced no output: {proc.stderr.strip()[:200]}"]
+    try:
+        data = json.loads(line[-1])
+    except json.JSONDecodeError:
+        return None, [f"runtime check output unparseable: {line[-1][:200]}"]
+    if data.get("ok") is None:
+        return None, [data.get("infra", "unverifiable")]
+    return bool(data.get("ok")), list(data.get("errors", []))

requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ gradio>=5.49,<6
2	+ liteforge==0.2.5

smolcode_core-0.1.0-cp312-cp312-manylinux_2_39_x86_64.whl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d179e40e7e38999081cfdd9461c0879b1843f81ea39d4dac3262a3eab5d7931
+size 13694530

static/web_tui.js ADDED Viewed

	@@ -0,0 +1,380 @@

+(function () {
+  "use strict";
+  if (window.__smolcodeTuiInit) return;
+  window.__smolcodeTuiInit = true;
+  let leaderPending = false;
+  let leaderTimer = null;
+  function click(id) {
+    const root = document.getElementById(id);
+    if (!root) return;
+    const btn = root.tagName === "BUTTON" ? root : root.querySelector("button");
+    (btn || root).click();
+  }
+  function setHiddenValue(id, value) {
+    const root = document.getElementById(id);
+    if (!root) return;
+    const el = root.tagName === "TEXTAREA" || root.tagName === "INPUT"
+      ? root
+      : root.querySelector("textarea, input");
+    if (!el) return;
+    el.value = value;
+    el.dispatchEvent(new Event("input", { bubbles: true }));
+  }
+  window.__smolcodePick = function (idx) {
+    setHiddenValue("sc-picker-pick", String(idx));
+    click("sc-picker-confirm");
+  };
+  function editor() {
+    const root = document.getElementById("sc-editor");
+    if (root) {
+      if (root.tagName === "TEXTAREA" || root.tagName === "INPUT") return root;
+      const inner = root.querySelector("textarea, input[type='text']");
+      if (inner) return inner;
+    }
+    const boxes = document.querySelectorAll("[data-testid='textbox']");
+    return boxes.length ? boxes[boxes.length - 1] : null;
+  }
+  function PopupController(popupEl, kind) {
+    this.popup = popupEl;
+    this.kind = kind || "slash";
+    this.matches = [];
+    this.sel = 0;
+  }
+  PopupController.prototype.hide = function () {
+    if (this.popup) this.popup.style.display = "none";
+    this.matches = [];
+    this.sel = 0;
+  };
+  PopupController.prototype.render = function (matches, ta, replaceFrom) {
+    this.matches = matches;
+    this.sel = 0;
+    this.replaceFrom = replaceFrom;
+    this.ta = ta;
+    if (!this.popup) return;
+    this.popup.innerHTML = "";
+    const self = this;
+    matches.slice(0, 12).forEach(function (item, i) {
+      const row = document.createElement("div");
+      row.className = "sc-popup-item" + (i === 0 ? " sc-popup-sel" : "");
+      row.textContent = item;
+      row.onclick = function () {
+        self.sel = i;
+        self.accept();
+      };
+      self.popup.appendChild(row);
+    });
+    const rect = ta.getBoundingClientRect();
+    this.popup.style.display = matches.length ? "block" : "none";
+    this.popup.style.left = rect.left + "px";
+    this.popup.style.top = Math.max(0, rect.top - 160) + "px";
+    this.popup.style.width = Math.max(220, rect.width) + "px";
+    this._highlight();
+  };
+  PopupController.prototype._highlight = function () {
+    if (!this.popup) return;
+    const items = this.popup.querySelectorAll(".sc-popup-item");
+    items.forEach(function (el, i) {
+      el.classList.toggle("sc-popup-sel", i === this.sel);
+    }, this);
+  };
+  PopupController.prototype.move = function (delta) {
+    if (!this.matches.length) return;
+    this.sel = (this.sel + delta + this.matches.length) % this.matches.length;
+    this._highlight();
+  };
+  PopupController.prototype.accept = function () {
+    if (!this.matches.length || !this.ta) return;
+    const val = this.ta.value;
+    if (this.kind === "file") {
+      const atMatch = val.match(/(?:^|\s)@(\S*)$/);
+      if (!atMatch) return;
+      const atPos = val.length - atMatch[0].length + (atMatch[0].charAt(0) === " " ? 1 : 0);
+      const item = this.matches[this.sel];
+      this.ta.value = val.slice(0, atPos) + "@" + item + " ";
+    } else {
+      const item = this.matches[this.sel];
+      const rest = val.slice(this.replaceFrom);
+      this.ta.value = item + rest;
+    }
+    this.ta.dispatchEvent(new Event("input", { bubbles: true }));
+    this.hide();
+    this.ta.focus();
+  };
+  PopupController.prototype.tabComplete = function () {
+    if (!this.matches.length) return false;
+    this.accept();
+    return true;
+  };
+  PopupController.prototype.visible = function () {
+    return this.popup && this.popup.style.display === "block" && this.matches.length > 0;
+  };
+  function ensurePopup(cls) {
+    let el = document.querySelector("." + cls);
+    if (!el) {
+      el = document.createElement("div");
+      el.className = cls + " sc-popup";
+      document.body.appendChild(el);
+    }
+    return el;
+  }
+  const slashPopup = new PopupController(ensurePopup("sc-slash-popup"), "slash");
+  const filePopup = new PopupController(ensurePopup("sc-file-popup"), "file");
+  function hidePopups() {
+    slashPopup.hide();
+    filePopup.hide();
+  }
+  function onEditorInput(ta) {
+    const val = ta.value;
+    const cmds = window.__smolcode_commands || [];
+    if (val.startsWith("/") && !val.includes(" ")) {
+      const m = cmds.filter(function (c) { return c.startsWith(val); });
+      slashPopup.render(m, ta, val.length);
+      filePopup.hide();
+      return;
+    }
+    slashPopup.hide();
+    const atMatch = val.match(/(?:^|\s)@(\S*)$/);
+    if (atMatch) {
+      const prefix = atMatch[1];
+      const files = window.__smolcode_files || [];
+      const m = files.filter(function (f) { return f.startsWith(prefix); });
+      const atPos = val.length - atMatch[0].length + (atMatch[0].charAt(0) === " " ? 1 : 0);
+      filePopup.render(m, ta, atPos);
+      return;
+    }
+    filePopup.hide();
+  }
+  function activePopup() {
+    if (slashPopup.visible()) return slashPopup;
+    if (filePopup.visible()) return filePopup;
+    return null;
+  }
+  function onEditorKeyDown(e) {
+    const ta = e.target;
+    const popup = activePopup();
+    if (popup && (e.key === "ArrowDown" || e.key === "ArrowUp")) {
+      e.preventDefault();
+      popup.move(e.key === "ArrowDown" ? 1 : -1);
+      return;
+    }
+    if (popup && e.key === "Enter" && !e.shiftKey) {
+      e.preventDefault();
+      popup.accept();
+      return;
+    }
+    if (e.key === "Tab" && popup && !e.shiftKey) {
+      e.preventDefault();
+      popup.tabComplete();
+      return;
+    }
+    if (e.key === "Enter" && !e.shiftKey && !e.altKey) {
+      e.preventDefault();
+      hidePopups();
+      click("sc-submit");
+      return;
+    }
+    if (e.key === "Escape") {
+      hidePopups();
+      if (document.querySelector(".sc-overlay")) {
+        click("sc-close-overlay");
+      } else {
+        click("sc-interrupt");
+      }
+      return;
+    }
+    if (e.ctrlKey && (e.key === "l" || e.key === "L")) {
+      e.preventDefault();
+      click("sc-clear");
+      return;
+    }
+    if (e.ctrlKey && (e.key === "x" || e.key === "X")) {
+      e.preventDefault();
+      leaderPending = true;
+      if (leaderTimer) clearTimeout(leaderTimer);
+      leaderTimer = setTimeout(function () { leaderPending = false; }, 2000);
+      click("sc-whichkey");
+      return;
+    }
+    if (leaderPending && !e.ctrlKey && !e.metaKey && e.key.length === 1) {
+      leaderPending = false;
+      if (leaderTimer) clearTimeout(leaderTimer);
+      const map = {
+        m: "sc-open-picker-models",
+        a: "sc-open-picker-agents",
+        t: "sc-open-picker-themes",
+        l: "sc-open-picker-sessions",
+        n: "sc-new-session",
+        b: "sc-toggle-sidebar",
+        s: "sc-toggle-sidebar-view",
+        h: "sc-help",
+        o: "sc-cycle-mode",
+        e: "sc-cycle-think",
+      };
+      const btn = map[e.key.toLowerCase()];
+      if (btn) {
+        e.preventDefault();
+        click(btn);
+      }
+      return;
+    }
+    if (e.key === "Tab" && !e.shiftKey) {
+      if (trySlashTabComplete(ta, e)) return;
+    }
+    if (e.key === "Tab" && !e.shiftKey && !activePopup()) {
+      e.preventDefault();
+      click("sc-cycle-agent");
+      return;
+    }
+    if (e.key === "Tab" && e.shiftKey) {
+      e.preventDefault();
+      click("sc-cycle-mode");
+      return;
+    }
+    if (e.key === "F2") {
+      e.preventDefault();
+      click("sc-cycle-model");
+      return;
+    }
+    if (document.querySelector(".sc-picker") && !ta) {
+      if (e.key === "ArrowDown") {
+        e.preventDefault();
+        click("sc-picker-down");
+      } else if (e.key === "ArrowUp") {
+        e.preventDefault();
+        click("sc-picker-up");
+      } else if (e.key === "Enter") {
+        e.preventDefault();
+        click("sc-picker-confirm");
+      }
+    }
+  }
+  function onGlobalKeyDown(e) {
+    if (document.querySelector(".sc-picker") && document.activeElement !== editor()) {
+      if (e.key === "ArrowDown") {
+        e.preventDefault();
+        click("sc-picker-down");
+      } else if (e.key === "ArrowUp") {
+        e.preventDefault();
+        click("sc-picker-up");
+      } else if (e.key === "Enter") {
+        e.preventDefault();
+        click("sc-picker-confirm");
+      }
+    }
+  }
+  function bindEditor() {
+    const ta = editor();
+    if (!ta || ta.dataset.scBound) return;
+    ta.dataset.scBound = "1";
+    ta.addEventListener("input", function () { onEditorInput(ta); });
+    ta.addEventListener("keydown", onEditorKeyDown);
+  }
+  function bindChips() {
+    /* chips re-render with status HTML; use delegation in init() */
+  }
+  function onDocumentClick(e) {
+    const chip = e.target.closest("[data-picker]");
+    if (chip) {
+      const kind = chip.getAttribute("data-picker");
+      const map = {
+        models: "sc-open-picker-models",
+        agents: "sc-open-picker-agents",
+        themes: "sc-open-picker-themes",
+        sessions: "sc-open-picker-sessions",
+      };
+      if (map[kind]) {
+        e.preventDefault();
+        click(map[kind]);
+      }
+      return;
+    }
+    const modeBtn = e.target.closest("[data-action='cycle-mode']");
+    if (modeBtn) {
+      e.preventDefault();
+      click("sc-cycle-mode");
+    }
+  }
+  function slashMatches(val) {
+    if (!val.startsWith("/") || val.includes(" ")) return [];
+    const cmds = window.__smolcode_commands || [];
+    return cmds.filter(function (c) { return c.startsWith(val); });
+  }
+  function trySlashTabComplete(ta, e) {
+    const val = ta.value;
+    const matches = slashMatches(val);
+    if (!matches.length) return false;
+    e.preventDefault();
+    const popup = activePopup();
+    if (popup && popup.kind === "slash" && popup.matches.length) {
+      popup.tabComplete();
+      return true;
+    }
+    ta.value = matches[0];
+    ta.dispatchEvent(new Event("input", { bubbles: true }));
+    hidePopups();
+    return true;
+  }
+  function init() {
+    document.addEventListener("click", onDocumentClick);
+    document.addEventListener("click", function (e) {
+      const overlay = document.querySelector(".sc-overlay");
+      if (overlay && e.target === overlay) click("sc-close-overlay");
+    });
+    document.addEventListener("keydown", onGlobalKeyDown);
+    const obs = new MutationObserver(function () {
+      bindEditor();
+    });
+    obs.observe(document.body, { childList: true, subtree: true });
+    bindEditor();
+    setTimeout(bindEditor, 300);
+    setTimeout(bindEditor, 1500);
+  }
+  if (document.readyState === "loading") {
+    document.addEventListener("DOMContentLoaded", init);
+  } else {
+    init();
+  }
+})();