# RUNBOOK — set up, use, deploy, record **Microfactory Node: 3D Printer** — the 3D-printing node of the Microfactory. A small, **fully local Gemma** that learns 3D-printing expertise job-by-job and tells you where a print will fail *before* it runs. Two personas keep it honest: **Chief Engineer O'Brien** proposes settings; **La Forge** (a separate QA Inspector) grades — O'Brien never grades his own work. Model: `gemma4:e4b` local · `google/gemma-4-E4B-it` on the Space (ZeroGPU). This is the one operational doc: how to run it, use it, deploy it, and record it. History, findings, and the phase log live in `docs/reference/RUNBOOK-FINDINGS.md`; day-by-day plan in `docs/plan/`; the recording plan + published demo link live in `docs/writeup/02-VIDEO.md`. --- ## 1 · Run it locally ```bash cd chief-engineer make setup # uv sync (locked env) + generate sample meshes ollama serve & # its own terminal; leave running (for live Gemma) ollama pull gemma4:e4b # = gemma4:latest, ~9.6GB make test # offline core tests (~1s, no Ollama) — expect ALL PASSED make run # → http://localhost:7860 (status bar shows the live model) ``` Falls back to a deterministic advisor if Ollama is unreachable, so it never crashes; the UI always shows which model ran. `CHIEF_ENGINEER_MODEL=gemma4:e2b` for ~2× faster CPU latency. **Want the fine-tuned Chief Engineer instead of stock Gemma?** Four LoRA-fine-tuned variants are pre-merged + quantized and live on the public Ollama registry at [`ollama.com/kylebrodeur`](https://ollama.com/kylebrodeur): ```bash ollama pull kylebrodeur/microfactory-node-v3-qat # recommended (q4_k_m, 5.3GB) CHIEF_ENGINEER_MODEL=kylebrodeur/microfactory-node-v3-qat make run ``` The other tags (`-v2`, `:q4_0`, `microfactory-node` v1) plus the canonical HF Hub copies and the full publishing runbook are documented in [`learn/finetune/OLLAMA_PUBLISHING.md`](../learn/finetune/OLLAMA_PUBLISHING.md) and the [GGUF model card](https://huggingface.co/kylebrodeur/microfactory-node-gguf). --- ## 2 · Use the tool (the guided tour — also the judge's tour) Live at **[node.microfactory.space](https://node.microfactory.space)** (custom domain; fallback `build-small-hackathon-microfactory-lab.hf.space`). Four tabs, left to right. This is exactly what a judge should see. The top header carries the **model switcher** (LoRA v3 QAT / LoRA v2 / Base / Modal API), a small **info icon** tooltip, the **Warm up Model** button, and the live model status + clock. Every tab has its small primary action and a persistent **Reset** in the same top-right spot. The UI uses custom icons (no emojis) and one consolidated loader that reveals content once the work finishes. 1. **Build — define the job.** Quick-load **3DBenchy** (or generate a primitive / drop a mesh; drop one of your own printed STLs to demo on real parts). You don't pick the part type — the engineer **infers** it from the mesh ("reads this as overhang-dominant"). The simulated **environment** (temp / humidity), **build-plate position** (center / edge / corner), and **material** (PLA / PETG / ABS / TPU) share the top control row, with **RANDOMIZE / OVERRIDE / RESET / SLICE** on the right. The **Job Log** up top shows what is stored and where. Hit **SLICE** (top right). 2. **Slice — the pre-flight read (the load-bearing moment).** Slicer image + motion preview render immediately; the horizontal **LAYER** scrubber below the image steps through real cross-sections of *this* part. The **THE READ** segmented toggle flips between **Engineer's Read** and **Second Opinion**, one panel at a time. **O'Brien** recalls the closest prior jobs, says what transfers, flags the failure regions **before anything prints**, and the **Spine** vetoes unsafe values (validation + g-code fold into his read). Flip to **Second Opinion** → **La Forge** critiques the plan; a *dispute* **holds → PRINT** until you acknowledge. Clicking **Second Opinion** a second time does not re-run it for the same build. 3. **Print — run it and watch it compound.** **THE PLAN** card up top frames the run: *what we're testing* (the job + conditions + the question), the Engineer's Spine-validated proposed settings, and *what La Forge expects*. Press **PRINT** (top right); the compact **ITERATIONS** slider sets how many runs (**1 = a single print**). Results stream in live as each iteration finishes: the **OUTCOME · WHAT HAPPENED** block (simulated result + La Forge run verdict) appears first, followed by the quality curve, the **iteration log**, the learned policy cell, and a compact **LOG A REAL PRINT** strip to feed a real-machine outcome back into the ledger. **OVERRIDE PLAN** (same popup component as OVERRIDE ENVIRONMENT on LOAD) lets you print against your own settings instead of the Engineer's. *(Pick a genuinely hard job — the sim only fails prints that should: PETG overhang @ ~30 °C/65 % climbs 0.55→0.70; Benchy+PETG @30/68 climbs 0.68→0.81.)* 4. **Review — the whole job in one place.** The **SESSION RECORD** assembles the full story: the inputs, O'Brien's read, La Forge's pre-print second opinion, the simulated run (curve + iteration log), the outcome + run verdict, and next steps. Below it the lesson ledger grows (seed → earned → sim) and the **capability mesh** is a collapsible outlook view. La Forge's run verdict also sits up top. **RESET TO BASELINE** (top right, present on every tab) starts a fresh demo (clears this session's runs + learned policy; keeps seed + ingested). **The honesty spine (say this out loud):** the Engineer proposes, a deterministic Spine disposes, a deterministic world produces the outcome, and a *separate* Inspector grades it — the model never marks its own homework. **Real:** compounding retrieval + learned policy, proactive risk flags, local Gemma, the QA Inspector. **Simulated (the one boundary):** print *outcomes* (`sim/outcome.py`). **Frontier (named, not faked):** weight-level fine-tuning, multi-node execution, physical sensors/camera. We calibrated the sim against 178 real failure prints, measured **32.6%**, found the gap was structural, and **documented it instead of faking a tuned number** (`sim/calibration/CALIBRATION-REPORT.md`). **If the live model is slow/falls back:** the Space runs Gemma on ZeroGPU (first BUILD loads it, ~30s). If GPU quota is momentarily out it **falls back** to the deterministic advisor (clearly labeled) rather than erroring. --- ## 3 · Deploy to the Space > **Final submission pass?** Work the `docs/plan/SUBMISSION-PUNCHDOWN.md` list, then > follow `docs/plan/FINAL-SEED-AND-DEPLOY.md` — the ordered seed/deploy commands. (Both live in > the private working repo; they're operator checklists, not part of the public artifact.) Space: **`build-small-hackathon/microfactory-lab`** · SDK gradio, app at **root**, hardware **ZeroGPU**, **HF Pro active** (ample quota → live Gemma on screen). The Space installs from `requirements.txt`; the `chief-engineer/` contents go to the Space **root**. The local agent has `hf` CLI access and can run the auth/repo checks and the push directly. ```bash make deploy-check # offline GO/NO-GO gates (D1–D10). Run any time; nothing is pushed. make deploy # gates → if green + authenticated, upload_folder to the Space + factory reboot ``` `make deploy` (= `scripts/deploy_preflight.py --push`) uploads everything **except** `docs/`, `spike/`, caches, secrets, and runtime files — so `learn/`, `assets/`, and `data/*.jsonl` go too (the app needs them). The gates: build imports + core tests, all Space files present, README frontmatter valid (`short_description` ≤60), lean reference block, clean ledger baseline, data well-formed, credentials (D8), live Space state (D9), and the **field-log dataset set + logging (D10)**. **Credentials:** an **HF write token** (member of `build-small-hackathon`) — `hf auth login` or `export HF_TOKEN=…`. Check with `hf auth whoami`. The Claude-on-the-web session carries no token by default (deploy-check warns D8); deploy from an authenticated machine or set `HF_TOKEN`. **Space variables** (set once; a change triggers rebuild): ```bash hf spaces variables add build-small-hackathon/microfactory-lab \ -e GRADIO_SSR_MODE=False -e CHIEF_ENGINEER_BACKEND=zerogpu \ -e CHIEF_ENGINEER_HF_MODEL=google/gemma-4-E4B-it ``` After deploy, **smoke-test the live UI:** LOAD a part, then **SLICE** shows O'Brien reasoning (NOT "Error"); La Forge second opinion + the dispute-gate work; LAYER scrubber slides; Print loop runs; Review shows the ledger + verdict + **↺ RESET**; wide layout, no empty right gutter. ### Field log — "all runs → a shared dataset" (Sharing is Caring) Every interaction (build / second-opinion / simulate / print / record) logs one flat row to a HF Dataset via `core/field_log.py` (`CommitScheduler`, flushes ~5 min) — **automatic once the token is set, silently no-ops without it** (local/offline unaffected; config + outcomes only, never PII or files; rows are candidates, never auto-promoted to the curated ledger). 1. **Dataset repo — check first** (it likely exists): `hf datasets info build-small-hackathon/chief-engineer-field-log`. Create only if missing (the first `hf upload` to a non-existent dataset creates it; add `--private` for a private repo); never recreate. Must match `FIELD_LOG_REPO` in `core/field_log.py`. 2. **`HF_TOKEN` as a Space *secret*** (write, org member): Space → Settings → Variables and secrets → New secret. Reboot. 3. **Verify:** do one SLICE on the Space, wait ≤5 min, confirm a new row in `interactions.jsonl` (or `make deploy-check` D10). The schema is one flat 26-column table → renders cleanly in the HF dataset viewer. ### Deliberation traces — "how the agent reasons" (Sharing is Caring) A second open dataset that captures the **turn-by-turn argument between the personas** (O'Brien proposes → Spine vetoes → La Forge second opinion/dispute → operator override → World simulates → La Forge grades → run verdict). One row per turn; shares one schema across two sources: - **Static, reproducible export:** `make deliberation` → `dist/deliberation/` (JSONL + card). Offline-safe; run with `ollama serve` up first to capture O'Brien's *real* reasoning rather than the `[fallback]` text. - **Live, every run:** `core/deliberation_log.py` (`CommitScheduler`, same `HF_TOKEN` gate + best-effort/never-break contract as the field log) appends turns on each Space run. Schema: `session_id, track, turn, agent, role, act, stance, content, material, geometry, bed_position, env_temp, env_humidity, ts` — renders cleanly in the dataset viewer. 1. **Dataset repo — check first:** `hf datasets info kylebrodeur/chief-engineer-deliberation`. Create only if missing (the first `hf upload` creates it; add `--private` for private); must match `DELIB_LOG_REPO` in `core/deliberation_log.py` and `HF_REPO` in `scripts/export_deliberation.py`. 2. **Seed it (optional but nicer):** `ollama serve` + `make deliberation`, then `hf upload kylebrodeur/chief-engineer-deliberation dist/deliberation . --repo-type dataset` (the card ships as the repo README). 3. **Live capture:** the **same** `HF_TOKEN` Space secret that powers the field log also powers this — no extra setup. Do one LOAD → SLICE → PRINT on the Space, wait ≤5 min, confirm new rows in `deliberations.jsonl`. --- ## 4 · Record the demo Full beat sheet, shot list, and **what to say** for each beat: **`docs/writeup/02-VIDEO.md`**. ```bash # clean curve first (keeps seed+ingested), then record git checkout -- data/lessons.jsonl && rm -f data/policy.json # or click ↺ RESET in Review make record-check # recording preflight (cap-cli + Space + playwright gates) make record-beat BEAT=load # record the LOAD beat (one .cap project per beat) make record-beat BEAT=slice # record the SLICE beat ./scripts/export-beat.sh /path/to/beat.cap recordings/beats/.mp4 ``` Record from the **live Space** (HF Pro = live Gemma reasoning on screen). Use a **climbing job** for the compounding beat (above). Per-beat recording is handled by `scripts/record-beat.sh`. See `recordings/EDITING.md` for the full beat list and which beats need Cap Studio polish. One-time deps: `uv pip install playwright && uv run playwright install chromium`. **Cleaner capture (Kyle's setup):** - **Install the Chrome app Gradio offers** (the "Install" / PWA prompt in the address bar on the Space). It opens the app in its own chromeless window — no tabs, no URL bar — so the screen recording is a clean interface with just the console. - **Cap Studio with a "Desktop Background"** for the final cut: export a **1080p** version so the screen capture sits cleanly alongside the regular (camera) footage when the two tracks are cut together. Match the 1080p export to the camera end-cap resolution. --- ## 5 · Quick reference | Command | Purpose | |---|---| | `make setup` | uv sync (locked env) + generate meshes | | `make test` | offline core tests (no Ollama, ~1s) | | `make run` | launch the app (status bar shows the live model) | | `make preflight` | live-stack GO/NO-GO (Ollama, latency, JSON, reasoning, Spine, assets) | | `make deploy-check` | deploy/record readiness gates (D1–D10, offline) | | `make deploy` | gates → push the Space (`upload_folder`) + factory reboot (needs `HF_TOKEN`) | | `make record-check` | recording preflight (cap-cli + Space + playwright gates) | | `make record-beat BEAT=load` | record one beat to its own `.cap` project | | `./scripts/export-beat.sh ` | export a `.cap` project to MP4 | | `uv run python scripts/assemble-video.py recordings/manifest.json` | assemble camera + beats + VO | | `make trace` | export the ledger as a Hub-ready dataset → [kylebrodeur/chief-engineer-ledger](https://huggingface.co/datasets/kylebrodeur/chief-engineer-ledger) | | `make deliberation` | export the multi-persona deliberation as a Hub-ready dataset → [kylebrodeur/chief-engineer-deliberation](https://huggingface.co/datasets/kylebrodeur/chief-engineer-deliberation) | | `core/field_log.py` | live Space interactions → [build-small-hackathon/chief-engineer-field-log](https://huggingface.co/datasets/build-small-hackathon/chief-engineer-field-log) | | `docs/reference/ACTIVITY.jsonl` | build activity trace → [kylebrodeur/chief-engineer-build-activity](https://huggingface.co/datasets/kylebrodeur/chief-engineer-build-activity) | | `learn/finetune/activity.jsonl` | fine-tune pipeline activity trace → [kylebrodeur/chief-engineer-finetune-activity](https://huggingface.co/datasets/kylebrodeur/chief-engineer-finetune-activity) | | `git checkout -- data/lessons.jsonl && rm -f data/policy.json` | reset demo curve (keeps seed + ingested) | | `CHIEF_ENGINEER_MODEL=gemma4:e2b` | faster model (env wins over `.env`) | --- ## 6 · Troubleshoot (durable gotchas) - **Dev Mode must be OFF** on the Space — an `openvscode` build log means it runs the dev shell, not `app.py` (you'll see a default greet template). Disable + reboot. - **`short_description` ≤ 60 chars** in README frontmatter or `hf upload` rejects it. - **`requirements.txt` only** is mounted at build — ZeroGPU deps (`spaces`/`torch`/ `transformers`/`accelerate`) are **inlined** there. Local dev stays lean (uv base, 4 deps); `app.py` shims `spaces` to a no-op when absent. - **`@spaces.GPU` is on the inference function only** (`core/llm_zerogpu._generate`), NOT `build_job` — so a quota-out falls back to the advisor instead of erroring the whole handler. `app.py` imports `core.llm_zerogpu` at startup so ZeroGPU still detects the GPU function. - **`data/lessons.jsonl` is the durable ledger** (tracked) — don't `rm` it; use the reset command (keeps seed + ingested). - **Use `make` targets / `uv run python -m scripts.`** — bare `python scripts/x.py` fails. History, findings, and the phase log: **`docs/reference/RUNBOOK-FINDINGS.md`**. Deploy deep-dive: `docs/reference/DEPLOYMENT.md`. Open items: `docs/plan/ISSUES.md`.