Spaces:
Runtime error
Runtime error
| # RUNBOOK β set up, use, deploy, record | |
| **Microfactory Node: 3D Printer** β the 3D-printing node of the Microfactory. A small, | |
| **fully local Gemma** that learns 3D-printing expertise job-by-job and tells you where a | |
| print will fail *before* it runs. Two personas keep it honest: **Chief Engineer O'Brien** | |
| proposes settings; **La Forge** (a separate QA Inspector) grades β O'Brien never grades his | |
| own work. Model: `gemma4:e4b` local Β· `google/gemma-4-E4B-it` on the Space (ZeroGPU). | |
| This is the one operational doc: how to run it, use it, deploy it, and record it. History, | |
| findings, and the phase log live in `docs/reference/RUNBOOK-FINDINGS.md`; day-by-day plan in | |
| `docs/plan/`; the recording plan + published demo link live in `docs/writeup/02-VIDEO.md`. | |
| --- | |
| ## 1 Β· Run it locally | |
| ```bash | |
| cd chief-engineer | |
| make setup # uv sync (locked env) + generate sample meshes | |
| ollama serve & # its own terminal; leave running (for live Gemma) | |
| ollama pull gemma4:e4b # = gemma4:latest, ~9.6GB | |
| make test # offline core tests (~1s, no Ollama) β expect ALL PASSED | |
| make run # β http://localhost:7860 (status bar shows the live model) | |
| ``` | |
| Falls back to a deterministic advisor if Ollama is unreachable, so it never crashes; the UI | |
| always shows which model ran. `CHIEF_ENGINEER_MODEL=gemma4:e2b` for ~2Γ faster CPU latency. | |
| **Want the fine-tuned Chief Engineer instead of stock Gemma?** Four LoRA-fine-tuned variants are | |
| pre-merged + quantized and live on the public Ollama registry at | |
| [`ollama.com/kylebrodeur`](https://ollama.com/kylebrodeur): | |
| ```bash | |
| ollama pull kylebrodeur/microfactory-node-v3-qat # recommended (q4_k_m, 5.3GB) | |
| CHIEF_ENGINEER_MODEL=kylebrodeur/microfactory-node-v3-qat make run | |
| ``` | |
| The other tags (`-v2`, `:q4_0`, `microfactory-node` v1) plus the canonical HF Hub copies | |
| and the full publishing runbook are documented in | |
| [`learn/finetune/OLLAMA_PUBLISHING.md`](../learn/finetune/OLLAMA_PUBLISHING.md) and the | |
| [GGUF model card](https://huggingface.co/kylebrodeur/microfactory-node-gguf). | |
| --- | |
| ## 2 Β· Use the tool (the guided tour β also the judge's tour) | |
| Live at **[node.microfactory.space](https://node.microfactory.space)** (custom domain; fallback | |
| `build-small-hackathon-microfactory-lab.hf.space`). Four tabs, left to right. This is exactly what | |
| a judge should see. The top header carries the **model switcher** (LoRA v3 QAT / LoRA v2 / Base / | |
| Modal API), a small **info icon** tooltip, the **Warm up Model** button, and the live model status | |
| + clock. Every tab has its small primary action and a persistent **Reset** in the same top-right | |
| spot. The UI uses custom icons (no emojis) and one consolidated loader that reveals content once | |
| the work finishes. | |
| 1. **Build β define the job.** Quick-load **3DBenchy** (or generate a primitive / drop a | |
| mesh; drop one of your own printed STLs to demo on real parts). You don't pick the part | |
| type β the engineer **infers** it from the mesh ("reads this as overhang-dominant"). | |
| The simulated **environment** (temp / humidity), **build-plate position** (center / edge / | |
| corner), and **material** (PLA / PETG / ABS / TPU) share the top control row, with | |
| **RANDOMIZE / OVERRIDE / RESET / SLICE** on the right. The **Job Log** up top shows what | |
| is stored and where. Hit **SLICE** (top right). | |
| 2. **Slice β the pre-flight read (the load-bearing moment).** Slicer image + motion preview | |
| render immediately; the horizontal **LAYER** scrubber below the image steps through real | |
| cross-sections of *this* part. The **THE READ** segmented toggle flips between | |
| **Engineer's Read** and **Second Opinion**, one panel at a time. **O'Brien** recalls the | |
| closest prior jobs, says what transfers, flags the failure regions **before anything | |
| prints**, and the **Spine** vetoes unsafe values (validation + g-code fold into his read). | |
| Flip to **Second Opinion** β **La Forge** critiques the plan; a *dispute* **holds β PRINT** | |
| until you acknowledge. Clicking **Second Opinion** a second time does not re-run it for | |
| the same build. | |
| 3. **Print β run it and watch it compound.** **THE PLAN** card up top frames the run: *what | |
| we're testing* (the job + conditions + the question), the Engineer's Spine-validated | |
| proposed settings, and *what La Forge expects*. Press **PRINT** (top right); the compact | |
| **ITERATIONS** slider sets how many runs (**1 = a single print**). Results stream in live as | |
| each iteration finishes: the **OUTCOME Β· WHAT HAPPENED** block (simulated result + La Forge | |
| run verdict) appears first, followed by the quality curve, the **iteration log**, the learned | |
| policy cell, and a compact **LOG A REAL PRINT** strip to feed a real-machine outcome back into | |
| the ledger. **OVERRIDE PLAN** (same popup component as OVERRIDE ENVIRONMENT on LOAD) lets you | |
| print against your own settings instead of the Engineer's. *(Pick a genuinely hard job β the | |
| sim only fails prints that should: PETG overhang @ ~30 Β°C/65 % climbs 0.55β0.70; Benchy+PETG | |
| @30/68 climbs 0.68β0.81.)* | |
| 4. **Review β the whole job in one place.** The **SESSION RECORD** assembles the full story: | |
| the inputs, O'Brien's read, La Forge's pre-print second opinion, the simulated run (curve + | |
| iteration log), the outcome + run verdict, and next steps. Below it the lesson ledger grows | |
| (seed β earned β sim) and the **capability mesh** is a collapsible outlook view. La Forge's | |
| run verdict also sits up top. **RESET TO BASELINE** (top right, present on every tab) starts a | |
| fresh demo (clears this session's runs + learned policy; keeps seed + ingested). | |
| **The honesty spine (say this out loud):** the Engineer proposes, a deterministic Spine | |
| disposes, a deterministic world produces the outcome, and a *separate* Inspector grades it β | |
| the model never marks its own homework. **Real:** compounding retrieval + learned policy, | |
| proactive risk flags, local Gemma, the QA Inspector. **Simulated (the one boundary):** print | |
| *outcomes* (`sim/outcome.py`). **Frontier (named, not faked):** weight-level fine-tuning, | |
| multi-node execution, physical sensors/camera. We calibrated the sim against 178 real failure | |
| prints, measured **32.6%**, found the gap was structural, and **documented it instead of | |
| faking a tuned number** (`sim/calibration/CALIBRATION-REPORT.md`). | |
| **If the live model is slow/falls back:** the Space runs Gemma on ZeroGPU (first BUILD loads | |
| it, ~30s). If GPU quota is momentarily out it **falls back** to the deterministic | |
| advisor (clearly labeled) rather than erroring. | |
| --- | |
| ## 3 Β· Deploy to the Space | |
| > **Final submission pass?** Work the `docs/plan/SUBMISSION-PUNCHDOWN.md` list, then | |
| > follow `docs/plan/FINAL-SEED-AND-DEPLOY.md` β the ordered seed/deploy commands. (Both live in | |
| > the private working repo; they're operator checklists, not part of the public artifact.) | |
| Space: **`build-small-hackathon/microfactory-lab`** Β· SDK gradio, app at **root**, hardware | |
| **ZeroGPU**, **HF Pro active** (ample quota β live Gemma on screen). The Space installs from | |
| `requirements.txt`; the `chief-engineer/` contents go to the Space **root**. The local agent | |
| has `hf` CLI access and can run the auth/repo checks and the push directly. | |
| ```bash | |
| make deploy-check # offline GO/NO-GO gates (D1βD10). Run any time; nothing is pushed. | |
| make deploy # gates β if green + authenticated, upload_folder to the Space + factory reboot | |
| ``` | |
| `make deploy` (= `scripts/deploy_preflight.py --push`) uploads everything **except** | |
| `docs/`, `spike/`, caches, secrets, and runtime files β so `learn/`, `assets/`, and | |
| `data/*.jsonl` go too (the app needs them). The gates: build imports + core tests, all Space | |
| files present, README frontmatter valid (`short_description` β€60), lean reference block, clean | |
| ledger baseline, data well-formed, credentials (D8), live Space state (D9), and the **field-log | |
| dataset set + logging (D10)**. | |
| **Credentials:** an **HF write token** (member of `build-small-hackathon`) β `hf auth login` | |
| or `export HF_TOKEN=β¦`. Check with `hf auth whoami`. The Claude-on-the-web session carries no | |
| token by default (deploy-check warns D8); deploy from an authenticated machine or set `HF_TOKEN`. | |
| **Space variables** (set once; a change triggers rebuild): | |
| ```bash | |
| hf spaces variables add build-small-hackathon/microfactory-lab \ | |
| -e GRADIO_SSR_MODE=False -e CHIEF_ENGINEER_BACKEND=zerogpu \ | |
| -e CHIEF_ENGINEER_HF_MODEL=google/gemma-4-E4B-it | |
| ``` | |
| After deploy, **smoke-test the live UI:** LOAD a part, then **SLICE** shows O'Brien | |
| reasoning (NOT "Error"); La Forge second opinion + the dispute-gate work; LAYER | |
| scrubber slides; Print loop runs; Review shows the ledger + verdict + **βΊ RESET**; | |
| wide layout, no empty right gutter. | |
| ### Field log β "all runs β a shared dataset" (Sharing is Caring) | |
| Every interaction (build / second-opinion / simulate / print / record) logs one flat row to a | |
| HF Dataset via `core/field_log.py` (`CommitScheduler`, flushes ~5 min) β **automatic once the | |
| token is set, silently no-ops without it** (local/offline unaffected; config + outcomes only, | |
| never PII or files; rows are candidates, never auto-promoted to the curated ledger). | |
| 1. **Dataset repo β check first** (it likely exists): `hf datasets info build-small-hackathon/chief-engineer-field-log`. Create only if missing (the first `hf upload` to a non-existent dataset creates it; add `--private` for a private repo); never recreate. Must match `FIELD_LOG_REPO` in `core/field_log.py`. | |
| 2. **`HF_TOKEN` as a Space *secret*** (write, org member): Space β Settings β Variables and secrets β New secret. Reboot. | |
| 3. **Verify:** do one SLICE on the Space, wait β€5 min, confirm a new row in `interactions.jsonl` (or `make deploy-check` D10). The schema is one flat 26-column table β renders cleanly in the HF dataset viewer. | |
| ### Deliberation traces β "how the agent reasons" (Sharing is Caring) | |
| A second open dataset that captures the **turn-by-turn argument between the personas** | |
| (O'Brien proposes β Spine vetoes β La Forge second opinion/dispute β operator override β | |
| World simulates β La Forge grades β run verdict). One row per turn; shares one schema across | |
| two sources: | |
| - **Static, reproducible export:** `make deliberation` β `dist/deliberation/` (JSONL + card). | |
| Offline-safe; run with `ollama serve` up first to capture O'Brien's *real* reasoning rather | |
| than the `[fallback]` text. | |
| - **Live, every run:** `core/deliberation_log.py` (`CommitScheduler`, same `HF_TOKEN` gate + | |
| best-effort/never-break contract as the field log) appends turns on each Space run. | |
| Schema: `session_id, track, turn, agent, role, act, stance, content, material, geometry, | |
| bed_position, env_temp, env_humidity, ts` β renders cleanly in the dataset viewer. | |
| 1. **Dataset repo β check first:** `hf datasets info kylebrodeur/chief-engineer-deliberation`. Create only if missing (the first `hf upload` creates it; add `--private` for private); must match `DELIB_LOG_REPO` in `core/deliberation_log.py` and `HF_REPO` in `scripts/export_deliberation.py`. | |
| 2. **Seed it (optional but nicer):** `ollama serve` + `make deliberation`, then `hf upload kylebrodeur/chief-engineer-deliberation dist/deliberation . --repo-type dataset` (the card ships as the repo README). | |
| 3. **Live capture:** the **same** `HF_TOKEN` Space secret that powers the field log also powers this β no extra setup. Do one LOAD β SLICE β PRINT on the Space, wait β€5 min, confirm new rows in `deliberations.jsonl`. | |
| --- | |
| ## 4 Β· Record the demo | |
| Full beat sheet, shot list, and **what to say** for each beat: **`docs/writeup/02-VIDEO.md`**. | |
| ```bash | |
| # clean curve first (keeps seed+ingested), then record | |
| git checkout -- data/lessons.jsonl && rm -f data/policy.json # or click βΊ RESET in Review | |
| make record-check # recording preflight (cap-cli + Space + playwright gates) | |
| make record-beat BEAT=load # record the LOAD beat (one .cap project per beat) | |
| make record-beat BEAT=slice # record the SLICE beat | |
| ./scripts/export-beat.sh /path/to/beat.cap recordings/beats/<beat>.mp4 | |
| ``` | |
| Record from the **live Space** (HF Pro = live Gemma reasoning on screen). Use a **climbing | |
| job** for the compounding beat (above). Per-beat recording is handled by `scripts/record-beat.sh`. See `recordings/EDITING.md` for the | |
| full beat list and which beats need Cap Studio polish. One-time deps: `uv pip install playwright && uv run playwright install chromium`. | |
| **Cleaner capture (Kyle's setup):** | |
| - **Install the Chrome app Gradio offers** (the "Install" / PWA prompt in the address bar on the | |
| Space). It opens the app in its own chromeless window β no tabs, no URL bar β so the screen | |
| recording is a clean interface with just the console. | |
| - **Cap Studio with a "Desktop Background"** for the final cut: export a **1080p** version so the | |
| screen capture sits cleanly alongside the regular (camera) footage when the two tracks are cut | |
| together. Match the 1080p export to the camera end-cap resolution. | |
| --- | |
| ## 5 Β· Quick reference | |
| | Command | Purpose | | |
| |---|---| | |
| | `make setup` | uv sync (locked env) + generate meshes | | |
| | `make test` | offline core tests (no Ollama, ~1s) | | |
| | `make run` | launch the app (status bar shows the live model) | | |
| | `make preflight` | live-stack GO/NO-GO (Ollama, latency, JSON, reasoning, Spine, assets) | | |
| | `make deploy-check` | deploy/record readiness gates (D1βD10, offline) | | |
| | `make deploy` | gates β push the Space (`upload_folder`) + factory reboot (needs `HF_TOKEN`) | | |
| | `make record-check` | recording preflight (cap-cli + Space + playwright gates) | | |
| | `make record-beat BEAT=load` | record one beat to its own `.cap` project | | |
| | `./scripts/export-beat.sh <cap> <mp4>` | export a `.cap` project to MP4 | | |
| | `uv run python scripts/assemble-video.py recordings/manifest.json` | assemble camera + beats + VO | | |
| | `make trace` | export the ledger as a Hub-ready dataset β [kylebrodeur/chief-engineer-ledger](https://huggingface.co/datasets/kylebrodeur/chief-engineer-ledger) | | |
| | `make deliberation` | export the multi-persona deliberation as a Hub-ready dataset β [kylebrodeur/chief-engineer-deliberation](https://huggingface.co/datasets/kylebrodeur/chief-engineer-deliberation) | | |
| | `core/field_log.py` | live Space interactions β [build-small-hackathon/chief-engineer-field-log](https://huggingface.co/datasets/build-small-hackathon/chief-engineer-field-log) | | |
| | `docs/reference/ACTIVITY.jsonl` | build activity trace β [kylebrodeur/chief-engineer-build-activity](https://huggingface.co/datasets/kylebrodeur/chief-engineer-build-activity) | | |
| | `learn/finetune/activity.jsonl` | fine-tune pipeline activity trace β [kylebrodeur/chief-engineer-finetune-activity](https://huggingface.co/datasets/kylebrodeur/chief-engineer-finetune-activity) | | |
| | `git checkout -- data/lessons.jsonl && rm -f data/policy.json` | reset demo curve (keeps seed + ingested) | | |
| | `CHIEF_ENGINEER_MODEL=gemma4:e2b` | faster model (env wins over `.env`) | | |
| --- | |
| ## 6 Β· Troubleshoot (durable gotchas) | |
| - **Dev Mode must be OFF** on the Space β an `openvscode` build log means it runs the dev | |
| shell, not `app.py` (you'll see a default greet template). Disable + reboot. | |
| - **`short_description` β€ 60 chars** in README frontmatter or `hf upload` rejects it. | |
| - **`requirements.txt` only** is mounted at build β ZeroGPU deps (`spaces`/`torch`/ | |
| `transformers`/`accelerate`) are **inlined** there. Local dev stays lean (uv base, 4 deps); | |
| `app.py` shims `spaces` to a no-op when absent. | |
| - **`@spaces.GPU` is on the inference function only** (`core/llm_zerogpu._generate`), NOT | |
| `build_job` β so a quota-out falls back to the advisor instead of erroring the whole handler. | |
| `app.py` imports `core.llm_zerogpu` at startup so ZeroGPU still detects the GPU function. | |
| - **`data/lessons.jsonl` is the durable ledger** (tracked) β don't `rm` it; use the reset | |
| command (keeps seed + ingested). | |
| - **Use `make` targets / `uv run python -m scripts.<name>`** β bare `python scripts/x.py` fails. | |
| History, findings, and the phase log: **`docs/reference/RUNBOOK-FINDINGS.md`**. Deploy | |
| deep-dive: `docs/reference/DEPLOYMENT.md`. Open items: `docs/plan/ISSUES.md`. | |