Spaces:
Runtime error
A newer version of the Gradio SDK is available: 6.19.0
RUNBOOK β set up, use, deploy, record
Microfactory Node: 3D Printer β the 3D-printing node of the Microfactory. A small,
fully local Gemma that learns 3D-printing expertise job-by-job and tells you where a
print will fail before it runs. Two personas keep it honest: Chief Engineer O'Brien
proposes settings; La Forge (a separate QA Inspector) grades β O'Brien never grades his
own work. Model: gemma4:e4b local Β· google/gemma-4-E4B-it on the Space (ZeroGPU).
This is the one operational doc: how to run it, use it, deploy it, and record it. History,
findings, and the phase log live in docs/reference/RUNBOOK-FINDINGS.md; day-by-day plan in
docs/plan/; the recording plan + published demo link live in docs/writeup/02-VIDEO.md.
1 Β· Run it locally
cd chief-engineer
make setup # uv sync (locked env) + generate sample meshes
ollama serve & # its own terminal; leave running (for live Gemma)
ollama pull gemma4:e4b # = gemma4:latest, ~9.6GB
make test # offline core tests (~1s, no Ollama) β expect ALL PASSED
make run # β http://localhost:7860 (status bar shows the live model)
Falls back to a deterministic advisor if Ollama is unreachable, so it never crashes; the UI
always shows which model ran. CHIEF_ENGINEER_MODEL=gemma4:e2b for ~2Γ faster CPU latency.
Want the fine-tuned Chief Engineer instead of stock Gemma? Four LoRA-fine-tuned variants are
pre-merged + quantized and live on the public Ollama registry at
ollama.com/kylebrodeur:
ollama pull kylebrodeur/microfactory-node-v3-qat # recommended (q4_k_m, 5.3GB)
CHIEF_ENGINEER_MODEL=kylebrodeur/microfactory-node-v3-qat make run
The other tags (-v2, :q4_0, microfactory-node v1) plus the canonical HF Hub copies
and the full publishing runbook are documented in
learn/finetune/OLLAMA_PUBLISHING.md and the
GGUF model card.
2 Β· Use the tool (the guided tour β also the judge's tour)
Live at node.microfactory.space (custom domain; fallback
build-small-hackathon-microfactory-lab.hf.space). Four tabs, left to right. This is exactly what
a judge should see. The top header carries the model switcher (LoRA v3 QAT / LoRA v2 / Base /
Modal API), a small info icon tooltip, the Warm up Model button, and the live model status
- clock. Every tab has its small primary action and a persistent Reset in the same top-right spot. The UI uses custom icons (no emojis) and one consolidated loader that reveals content once the work finishes.
- Build β define the job. Quick-load 3DBenchy (or generate a primitive / drop a mesh; drop one of your own printed STLs to demo on real parts). You don't pick the part type β the engineer infers it from the mesh ("reads this as overhang-dominant"). The simulated environment (temp / humidity), build-plate position (center / edge / corner), and material (PLA / PETG / ABS / TPU) share the top control row, with RANDOMIZE / OVERRIDE / RESET / SLICE on the right. The Job Log up top shows what is stored and where. Hit SLICE (top right).
- Slice β the pre-flight read (the load-bearing moment). Slicer image + motion preview render immediately; the horizontal LAYER scrubber below the image steps through real cross-sections of this part. The THE READ segmented toggle flips between Engineer's Read and Second Opinion, one panel at a time. O'Brien recalls the closest prior jobs, says what transfers, flags the failure regions before anything prints, and the Spine vetoes unsafe values (validation + g-code fold into his read). Flip to Second Opinion β La Forge critiques the plan; a dispute holds β PRINT until you acknowledge. Clicking Second Opinion a second time does not re-run it for the same build.
- Print β run it and watch it compound. THE PLAN card up top frames the run: what we're testing (the job + conditions + the question), the Engineer's Spine-validated proposed settings, and what La Forge expects. Press PRINT (top right); the compact ITERATIONS slider sets how many runs (1 = a single print). Results stream in live as each iteration finishes: the OUTCOME Β· WHAT HAPPENED block (simulated result + La Forge run verdict) appears first, followed by the quality curve, the iteration log, the learned policy cell, and a compact LOG A REAL PRINT strip to feed a real-machine outcome back into the ledger. OVERRIDE PLAN (same popup component as OVERRIDE ENVIRONMENT on LOAD) lets you print against your own settings instead of the Engineer's. (Pick a genuinely hard job β the sim only fails prints that should: PETG overhang @ ~30 Β°C/65 % climbs 0.55β0.70; Benchy+PETG @30/68 climbs 0.68β0.81.)
- Review β the whole job in one place. The SESSION RECORD assembles the full story: the inputs, O'Brien's read, La Forge's pre-print second opinion, the simulated run (curve + iteration log), the outcome + run verdict, and next steps. Below it the lesson ledger grows (seed β earned β sim) and the capability mesh is a collapsible outlook view. La Forge's run verdict also sits up top. RESET TO BASELINE (top right, present on every tab) starts a fresh demo (clears this session's runs + learned policy; keeps seed + ingested).
The honesty spine (say this out loud): the Engineer proposes, a deterministic Spine
disposes, a deterministic world produces the outcome, and a separate Inspector grades it β
the model never marks its own homework. Real: compounding retrieval + learned policy,
proactive risk flags, local Gemma, the QA Inspector. Simulated (the one boundary): print
outcomes (sim/outcome.py). Frontier (named, not faked): weight-level fine-tuning,
multi-node execution, physical sensors/camera. We calibrated the sim against 178 real failure
prints, measured 32.6%, found the gap was structural, and documented it instead of
faking a tuned number (sim/calibration/CALIBRATION-REPORT.md).
If the live model is slow/falls back: the Space runs Gemma on ZeroGPU (first BUILD loads it, ~30s). If GPU quota is momentarily out it falls back to the deterministic advisor (clearly labeled) rather than erroring.
3 Β· Deploy to the Space
Final submission pass? Work the
docs/plan/SUBMISSION-PUNCHDOWN.mdlist, then followdocs/plan/FINAL-SEED-AND-DEPLOY.mdβ the ordered seed/deploy commands. (Both live in the private working repo; they're operator checklists, not part of the public artifact.)
Space: build-small-hackathon/microfactory-lab Β· SDK gradio, app at root, hardware
ZeroGPU, HF Pro active (ample quota β live Gemma on screen). The Space installs from
requirements.txt; the chief-engineer/ contents go to the Space root. The local agent
has hf CLI access and can run the auth/repo checks and the push directly.
make deploy-check # offline GO/NO-GO gates (D1βD10). Run any time; nothing is pushed.
make deploy # gates β if green + authenticated, upload_folder to the Space + factory reboot
make deploy (= scripts/deploy_preflight.py --push) uploads everything except
docs/, spike/, caches, secrets, and runtime files β so learn/, assets/, and
data/*.jsonl go too (the app needs them). The gates: build imports + core tests, all Space
files present, README frontmatter valid (short_description β€60), lean reference block, clean
ledger baseline, data well-formed, credentials (D8), live Space state (D9), and the field-log
dataset set + logging (D10).
Credentials: an HF write token (member of build-small-hackathon) β hf auth login
or export HF_TOKEN=β¦. Check with hf auth whoami. The Claude-on-the-web session carries no
token by default (deploy-check warns D8); deploy from an authenticated machine or set HF_TOKEN.
Space variables (set once; a change triggers rebuild):
hf spaces variables add build-small-hackathon/microfactory-lab \
-e GRADIO_SSR_MODE=False -e CHIEF_ENGINEER_BACKEND=zerogpu \
-e CHIEF_ENGINEER_HF_MODEL=google/gemma-4-E4B-it
After deploy, smoke-test the live UI: LOAD a part, then SLICE shows O'Brien reasoning (NOT "Error"); La Forge second opinion + the dispute-gate work; LAYER scrubber slides; Print loop runs; Review shows the ledger + verdict + βΊ RESET; wide layout, no empty right gutter.
Field log β "all runs β a shared dataset" (Sharing is Caring)
Every interaction (build / second-opinion / simulate / print / record) logs one flat row to a
HF Dataset via core/field_log.py (CommitScheduler, flushes ~5 min) β automatic once the
token is set, silently no-ops without it (local/offline unaffected; config + outcomes only,
never PII or files; rows are candidates, never auto-promoted to the curated ledger).
- Dataset repo β check first (it likely exists):
hf datasets info build-small-hackathon/chief-engineer-field-log. Create only if missing (the firsthf uploadto a non-existent dataset creates it; add--privatefor a private repo); never recreate. Must matchFIELD_LOG_REPOincore/field_log.py. HF_TOKENas a Space secret (write, org member): Space β Settings β Variables and secrets β New secret. Reboot.- Verify: do one SLICE on the Space, wait β€5 min, confirm a new row in
interactions.jsonl(ormake deploy-checkD10). The schema is one flat 26-column table β renders cleanly in the HF dataset viewer.
Deliberation traces β "how the agent reasons" (Sharing is Caring)
A second open dataset that captures the turn-by-turn argument between the personas (O'Brien proposes β Spine vetoes β La Forge second opinion/dispute β operator override β World simulates β La Forge grades β run verdict). One row per turn; shares one schema across two sources:
- Static, reproducible export:
make deliberationβdist/deliberation/(JSONL + card). Offline-safe; run withollama serveup first to capture O'Brien's real reasoning rather than the[fallback]text. - Live, every run:
core/deliberation_log.py(CommitScheduler, sameHF_TOKENgate + best-effort/never-break contract as the field log) appends turns on each Space run.
Schema: session_id, track, turn, agent, role, act, stance, content, material, geometry, bed_position, env_temp, env_humidity, ts β renders cleanly in the dataset viewer.
- Dataset repo β check first:
hf datasets info kylebrodeur/chief-engineer-deliberation. Create only if missing (the firsthf uploadcreates it; add--privatefor private); must matchDELIB_LOG_REPOincore/deliberation_log.pyandHF_REPOinscripts/export_deliberation.py. - Seed it (optional but nicer):
ollama serve+make deliberation, thenhf upload kylebrodeur/chief-engineer-deliberation dist/deliberation . --repo-type dataset(the card ships as the repo README). - Live capture: the same
HF_TOKENSpace secret that powers the field log also powers this β no extra setup. Do one LOAD β SLICE β PRINT on the Space, wait β€5 min, confirm new rows indeliberations.jsonl.
4 Β· Record the demo
Full beat sheet, shot list, and what to say for each beat: docs/writeup/02-VIDEO.md.
# clean curve first (keeps seed+ingested), then record
git checkout -- data/lessons.jsonl && rm -f data/policy.json # or click βΊ RESET in Review
make record-check # recording preflight (cap-cli + Space + playwright gates)
make record-beat BEAT=load # record the LOAD beat (one .cap project per beat)
make record-beat BEAT=slice # record the SLICE beat
./scripts/export-beat.sh /path/to/beat.cap recordings/beats/<beat>.mp4
Record from the live Space (HF Pro = live Gemma reasoning on screen). Use a climbing
job for the compounding beat (above). Per-beat recording is handled by scripts/record-beat.sh. See recordings/EDITING.md for the
full beat list and which beats need Cap Studio polish. One-time deps: uv pip install playwright && uv run playwright install chromium.
Cleaner capture (Kyle's setup):
- Install the Chrome app Gradio offers (the "Install" / PWA prompt in the address bar on the Space). It opens the app in its own chromeless window β no tabs, no URL bar β so the screen recording is a clean interface with just the console.
- Cap Studio with a "Desktop Background" for the final cut: export a 1080p version so the screen capture sits cleanly alongside the regular (camera) footage when the two tracks are cut together. Match the 1080p export to the camera end-cap resolution.
5 Β· Quick reference
| Command | Purpose |
|---|---|
make setup |
uv sync (locked env) + generate meshes |
make test |
offline core tests (no Ollama, ~1s) |
make run |
launch the app (status bar shows the live model) |
make preflight |
live-stack GO/NO-GO (Ollama, latency, JSON, reasoning, Spine, assets) |
make deploy-check |
deploy/record readiness gates (D1βD10, offline) |
make deploy |
gates β push the Space (upload_folder) + factory reboot (needs HF_TOKEN) |
make record-check |
recording preflight (cap-cli + Space + playwright gates) |
make record-beat BEAT=load |
record one beat to its own .cap project |
./scripts/export-beat.sh <cap> <mp4> |
export a .cap project to MP4 |
uv run python scripts/assemble-video.py recordings/manifest.json |
assemble camera + beats + VO |
make trace |
export the ledger as a Hub-ready dataset β kylebrodeur/chief-engineer-ledger |
make deliberation |
export the multi-persona deliberation as a Hub-ready dataset β kylebrodeur/chief-engineer-deliberation |
core/field_log.py |
live Space interactions β build-small-hackathon/chief-engineer-field-log |
docs/reference/ACTIVITY.jsonl |
build activity trace β kylebrodeur/chief-engineer-build-activity |
learn/finetune/activity.jsonl |
fine-tune pipeline activity trace β kylebrodeur/chief-engineer-finetune-activity |
git checkout -- data/lessons.jsonl && rm -f data/policy.json |
reset demo curve (keeps seed + ingested) |
CHIEF_ENGINEER_MODEL=gemma4:e2b |
faster model (env wins over .env) |
6 Β· Troubleshoot (durable gotchas)
- Dev Mode must be OFF on the Space β an
openvscodebuild log means it runs the dev shell, notapp.py(you'll see a default greet template). Disable + reboot. short_descriptionβ€ 60 chars in README frontmatter orhf uploadrejects it.requirements.txtonly is mounted at build β ZeroGPU deps (spaces/torch/transformers/accelerate) are inlined there. Local dev stays lean (uv base, 4 deps);app.pyshimsspacesto a no-op when absent.@spaces.GPUis on the inference function only (core/llm_zerogpu._generate), NOTbuild_jobβ so a quota-out falls back to the advisor instead of erroring the whole handler.app.pyimportscore.llm_zerogpuat startup so ZeroGPU still detects the GPU function.data/lessons.jsonlis the durable ledger (tracked) β don'trmit; use the reset command (keeps seed + ingested).- Use
maketargets /uv run python -m scripts.<name>β barepython scripts/x.pyfails.
History, findings, and the phase log: docs/reference/RUNBOOK-FINDINGS.md. Deploy
deep-dive: docs/reference/DEPLOYMENT.md. Open items: docs/plan/ISSUES.md.