OffGridSchedula

Sleeping

App Files Files Community

OffGridSchedula / README.md

ParetoOptimal

Initial Commit

0366d65 19 days ago

preview code

Raw

History Blame Contribute Delete

27.9 kB

	---
	title: OffGridSchedula
	emoji: 🗓️
	colorFrom: indigo
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	license: apache-2.0
	short_description: Local-first chat-to-calendar agent (Gemma-4 E4B + MiniCPM)
	tags:
	- track:backyard
	- sponsor:openbmb
	- sponsor:modal
	- achievement:offgrid
	- achievement:welltuned
	- achievement:offbrand
	- achievement:llama
	- achievement:sharing
	- achievement:fieldnotes
	models:
	- build-small-hackathon/gemma-4-cal-gguf
	- openbmb/MiniCPM5-1B-GGUF
	demo_video:
	- https://youtu.be/m-o0u9X3tI4
	social_posts:
	- https://x.com/nate_mauer/status/2065973341651882386
	- https://x.com/nate_mauer/status/2064920352845709419
	- https://x.com/nate_mauer/status/2065661878441750916
	- https://www.linkedin.com/feed/update/urn:li:ugcPost:7471440639969132545
	blog_post:
	- https://huggingface.co/blog/build-small-hackathon/offgridschedula
	made_by:
	- ParetoOptimal - a.k.a., Nate Mauer
	---

	# 🗓️ Message Scheduling Agent

	**OffGridSchedula turns a pasted chat (or a flyer screenshot) into calendar events, catches conflicts, and drafts the reply — right from your phone, no app, no account,
	no setup. iOS allows neither background iMessage access nor a persistent on-device LLM server, so there's no autonomous on-device agent to install; instead,
	a foreground Shortcut ([docs/automations.md](./docs/automations.md)) hands a thread or screenshot to the agent in two taps (optionally using a remote model via `INFERENCE_BASE_URL`).**

	The model runs on your own server or even on the phone itself and not on a cloud AI service. Your chats aren't shipped off to a third-party AI to be read; agent reads your snippet in memory and
	discards it after replying. The run trace you can optionally share is a redacted, sent to the agent you control that turns it into ready-to-add calendar events.

	Hardware-aware. With under-powered hardware, the app warns users with an upgrade banner rather than hanging, the real model needs a tiny GPU.

	## Build Small submission — the idea & the tech

	The idea. A busy parent's calendar lives in other people's messages — picture day in the
	class chat, the practice that moved, the party flyer. OffGridSchedula turns those into calendar
	events: paste the chat (or snap the flyer) from a phone browser, review the extracted events, the
	conflicts against your own `.ics`, and a drafted reply — then add to Apple/Google Calendar in a tap.

	The tech. Two small local models do the work. Extraction is [`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf)
	(~4B effective params), our QLoRA fine-tune of Gemma-4 E4B that emits a single validated
	ActionPlan (events · conflicts · reply · clarifying question), served with vision through
	the official llama.cpp server inside this Docker Gradio Space — no cloud AI APIs. The
	fine-tune + its 60-example task eval ran entirely on Modal serverless GPUs, behind an
	eval gate that rejected eight regressed models before this one shipped. Conflict math is
	deterministic Python, the UI is fully custom, the agent doubles as an MCP tool server, and
	redacted run traces are public on the [Hub](https://huggingface.co/datasets/ParetoOptimal/offgridschedula-traces).
	Click Run the agents and a local OpenBMB MiniCPM planner (a second local llama-server)
	drives this same Space's MCP tools as a multi-step agent — extract → check conflicts → render
	`.ics` — with every step visible. Still zero cloud AI; every model under 32B.

	What's new. Extraction now reads the logistics, not just the date (see below): arrival-aware
	start times, duration→end conversion, type-based reminders, and calendar-ready titles — each
	guaranteed by deterministic post-processing even when the model wobbles, and each shipped through
	a measured A/B eval ([full result tables](./training/data/ab_results.md): regex vs text-LLM vs
	vision-LLM reading rendered screenshots only). Calendar out got one-click too: a unified
	Connect your calendar block (Google OAuth — the token lives in your browser, never on the
	server; Outlook/Apple need no sign-in) and per-event Google · Outlook · iCal links, with the
	Google push verified end-to-end (push → readback → delete, 11/11).
	The UX. One decision — Offline or Online — re-themes the whole workflow card and sets the
	path: off-grid `.ics` only, or a one-click "Connect your calendar" whose Google OAuth token
	lives only in the browser (server-verified each visit; the client secret never leaves the
	server). Results land in a single card: events, conflicts, the drafted reply, and per-event
	Google · Outlook · iCal · .ics quick-add links. Activity → This week tallies events
	captured, conflicts caught, and time saved; a per-device Memory (localStorage, one-click
	samples) feeds names and preferences back into extraction.

	Submission links: [requirement-by-requirement mapping](./docs/build-small-submission.md) ·
	[demo video](https://youtu.be/m-o0u9X3tI4) ·
	social posts [1](https://x.com/nate_mauer/status/2064920352845709419) ·
	[2](https://x.com/nate_mauer/status/2065661878441750916)

	## Who this is for

	A busy parent whose kid's school and activity events are buried in a noisy class group chat —
	picture day Thursday, the practice that moved to Tuesday, the birthday-party RSVP. They read it once,
	mean to add it later, and miss it. With this, they paste the chat (or a screenshot of a flyer
	or invite) from their phone's browser and get back: the events, a conflict check against their
	calendar, and a ready-to-send reply — all surfaced for review before anything is saved. Output is
	a local `.ics` they can add to any calendar, with optional Google Calendar push.

	No app to install and no account. It reads nothing automatically — the parent pastes only what they
	choose. Inference runs in the Space via `llama.cpp` (no cloud AI APIs), and works out of the box
	with no GPU (see Accuracy upgrade below).

	## The model: `gemma-cal` E4B — one calendar-native LLM, built for exactly this

	What makes this platform different isn't a prompt wrapped around a generic chatbot — it's
	**[`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf), our own fine-tune of
	Gemma-4 E4B purpose-built for one job: turning messy human conversation into calendar-ready
	structure.** The model doesn't chat. It reads a thread (or a flyer photo) and emits a single
	validated ActionPlan — events with exact ISO datetimes, conflicts, proposed alternatives, a
	drafted reply, and a clarifying question when the plan is too vague to schedule. **It is the one
	and only model the platform runs**, everywhere from the production Space to a laptop.

	- Edge-sized by design. ~5 GB at Q4 — serves on a ~$0.40/hr 16 GB T4 (vs $4+/hr A100-class
	for big models), a gaming GPU, or an Apple-silicon laptop, with full vision
	(screenshots/flyers) via its mmproj. Local-first isn't a tagline; it's the parameter count.
	- Schema-bulletproof. The fine-tune holds 100% schema validity even with no system prompt,
	with stronger no-event discipline (doesn't invent events from "thanks!") and a higher rate of
	asking when a date is TBD — the failure modes that actually burn users of generic models.
	- Convention-trained. It learns this product's date semantics ("next Tuesday" means next
	week's Tuesday; weekday-anchored relative dates) instead of whatever a base model absorbed
	from the internet.
	- Eval-gated, never vibes-shipped. Every retrain runs a 60-example task eval (start-exact
	datetime matching, F1, validity, clarification) and **cannot reach production unless it clears
	the gate** — the pipeline has rejected eight regressed models to date. The full, honest scorecard
	lives in [`docs/eval-roadmap.md`](./docs/eval-roadmap.md) and the
	[post-mortem write-up](./docs/blog-eval-gated-finetuning.md).

	Hackathon size constraint (≤ 32B): easily — E4B is ~4B effective parameters. See the in-app
	🏆 Submission tab for the full compliance scorecard.

	### Reads the logistics, not just the date

	A confirmation like *"Time: 10:30 AM · Duration: approx. 30–45 min · (Please arrive 15 minutes
	early to complete intake forms) · 📍 112A West 72nd Street…"* becomes one correct event:

	- Arrival-aware start — the event starts at 10:15 (when you must show up), the official
	10:30 is preserved in the notes, and the end is anchored to the stated time + duration
	(11:00), so the calendar block covers the forms and the visit.
	- Type-based notifications — an explicitly stated lead time always wins ("remind me 2 hours
	before" → 120); otherwise doctor/medical visits get 60 minutes, parties 30, carpools and school
	events 45.
	- Real-world addresses — multi-line and 📍-emoji locations join into one string;
	"(Upper West Side — 72nd & Columbus)" glosses and SMS footers ("Reply C to confirm… call us
	at 212-223-0349") don't confuse it.
	- Calendar-ready titles — an action+subject summary ("Pick up Priya — Terminal 4"), not a
	quote of the message.

	The model is taught these conventions (prompt + fine-tune data), but the load-bearing ones are
	also guaranteed by deterministic post-processing (`apply_text_rules` in
	[`server/agent.py`](./server/agent.py)) — same philosophy as the conflict engine: must-hold
	logistics are never left to model temperament. Every behavior above shipped through a measured
	A/B eval — regex baseline vs text-LLM vs vision-LLM reading rendered chat screenshots only —
	with the full tables in [`training/data/ab_results.md`](./training/data/ab_results.md)
	(headline: text-LLM event F1 0.96 structured / 0.89 unstructured vs regex 0.60/0.67; the
	screenshot-only vision arm lands within a point of text).

	## Try it in 30 seconds

	Open the Space in your phone's browser → Schedule tab → tap Try a sample (or paste your own
	group chat, and optionally a screenshot or your `.ics`) → review the detected events → **Download
	.ics. The Activity → This week** panel then shows what you've captured and the time it saved.

	## How it works

	```
	Paste a thread / screenshot ──▶ HF Space ──▶ llama.cpp ──▶ events + conflicts + reply
	(phone browser) │ │
	custom Gradio UI ◀── review ──┐ ┌────┘
	▼ ▼
	.ics download / optional Google Calendar
	```

	The primary path needs nothing but a browser: paste text and/or attach a screenshot in the
	Schedule tab. (Power users can also auto-feed messages from a Mac — see Optional: Mac collector.)

	For the full solution-architecture view — every workflow and which LLM (if any) it calls,
	plus the eval-gated fine-tuning loop — see [docs/architecture.md](./docs/architecture.md).

	## Can it process multiple invites at once?

	Yes — multiple invites in one paste is the designed path (on the live Space, where the real
	model runs). `ActionPlan.events` is a list, and the extraction prompt explicitly tells the model
	that one thread often holds several events — a drop-off AND a pickup, or two appointments, are
	separate events (`server/agent.py`). Everything downstream is built for N events: the results card
	shows "N events found" with one card per invite, the editable table gets one row each, the `.ics`
	contains one `VEVENT` per event, each event carries its own Google/Outlook/Apple quick-add links,
	and the conflict check runs across all of them. Screenshot input is multi-file too — attach several
	flyers and they're all read in one run.

	Two caveats:

	- Stub mode extracts only the first invite. The local-dev heuristic (`_stub_plan` in
	`server/agent.py`, enabled by `USE_STUB_EXTRACTOR=1`) works with no model and no GPU — and it's
	now a decent parser in its own right (labeled times, explicit dates, multi-line/📍 locations,
	durations, arrival-early shifts, type-based reminders) — but it still returns at most one
	event. If you paste a multi-invite thread locally and get one event back, that's the stub, not
	the product; the deployed Space uses the multi-event model path.
	- *Simultaneous runs* are serialized, not parallel.** If two users (or two tabs) hit *Run the
	agents* at once, both complete, but inference executes one request at a time — `server/model.py`
	holds the llama.cpp instance behind a `threading.Lock`, and Gradio queues the events. On a
	single-GPU Space that's intentional (one model copy in memory); the second run simply waits its
	turn, then streams its own pipeline progress.

	## Repo layout

	```
	app.py # Gradio + FastAPI entrypoint (the Space)
	server/
	agent.py # thread (+images) -> validated ActionPlan
	orchestrator.py # Run the agents: MiniCPM planner driving our own MCP tools
	schema.py # Event / Conflict / ActionPlan pydantic models
	model.py # llama.cpp load: GGUF + vision mmproj, constrained JSON
	imageutil.py # image -> base64 data URI
	ui/blocks.py # custom Gradio Blocks (reasoning, events, conflicts, reply)
	static/app.css # custom CSS (Off-Brand)
	calendar_out/
	ics.py # .ics generation (off-grid default)
	freebusy.py # parse existing .ics + deterministic conflict detection
	gcal.py # optional Google Calendar push
	collector/collector.py # Mac-side iMessage collector (text + image attachments)
	training/ # dataset build + QLoRA fine-tune + GGUF/mmproj export
	Dockerfile # dedicated-GPU Space: builds llama.cpp (0.3.28) WITH CUDA
	requirements-docker.txt # runtime deps for the Docker image (llama.cpp built separately)
	PLAN.md # full design + build plan
	```

	## Quick start (local dev) — no GPU needed

	```bash
	pip install -r requirements.txt

	# Runs the whole app with the built-in heuristic agent — no model, no GPU:
	export USE_STUB_EXTRACTOR=1 INGEST_TOKEN="dev-secret"
	python app.py # http://localhost:7860
	```

	Open it, go to the Schedule tab, and tap Try a sample — or paste a thread, attach chat
	screenshots, and optionally upload your current calendar `.ics` for conflict checks.
	(Heads-up: the stub agent extracts only the first invite in a thread — multi-invite extraction
	needs the real model; see Can it process multiple invites at once? above.) Tip for
	self-hosted installs: set `CAL_ICS_PATH=/path/to/calendar.ics` and conflict checks use that file
	automatically whenever no `.ics` is uploaded — step 4 completes itself, fully offline. Review
	the detected events, conflicts, proposed times, and the suggested reply, then add any event with
	its Add to: Google · Outlook · iCal · .ics links (iCal and .ics both download the event's
	`.ics` file; with 2+ events an iCal — all N events link grabs everything at once).
	The Activity → This week panel shows what you've captured.

	## This week (impact)

	The Activity tab has a This week panel that persists across restarts: events captured,
	conflicts caught, and estimated time saved. A "capture" is counted when a run surfaces
	events for review (adding to a calendar happens through the per-event links, which the server
	can't observe).

	`minutes_saved` is a deliberately conservative, configurable estimate — not a measurement:
	`IMPACT_MIN_PER_EVENT` (default 8 min per captured event) + `IMPACT_MIN_PER_CONFLICT` (default
	15 min per conflict caught). Override either via env. State persists to `IMPACT_PATH`
	(default `/tmp/impact_weeks.json`; point it at a persistent disk on a Space to survive rebuilds).

	## Accuracy upgrade (optional) — serve the real `gemma-cal` LLM

	The stub agent above makes the demo work with no GPU. The production Space serves our
	fine-tuned `gemma-cal` E4B through `llama-server` — no cloud AI APIs either way. The same
	config works anywhere llama.cpp runs:

	```bash
	export USE_STUB_EXTRACTOR=0
	export MODEL_HF_REPO="build-small-hackathon/gemma-4-cal-gguf"
	export MODEL_FILE="gemma-cal-e4b-Q4_K_M.gguf" # ~5 GB edge fine-tune (what the Space serves)
	export MMPROJ_REPO="unsloth/gemma-4-E4B-it-GGUF" # the E4B's own vision projector
	export MMPROJ_FILE="mmproj-F16.gguf" # enables screenshot/vision input
	bash scripts/start_space.sh
	```

	This is the platform's only model — the same ~5 GB GGUF serves the production Space (16 GB
	T4), a gaming GPU, or a laptop. (`MODEL_FILE` is explicit on purpose: the model repo also stores
	legacy training artifacts, so the `-hf repo:Q4_K_M` shorthand is ambiguous.)

	## Optional: Mac collector (power users)

	The phone-paste path above needs nothing installed. If you'd rather have new iMessages fed in
	automatically, run the collector on a Mac where iMessages sync (iOS exposes no API for message
	content, so a Mac is the only auto-feed source):

	```bash
	cd collector && cp .env.example .env # edit SPACE_URL + INGEST_TOKEN
	python collector.py
	```

	> ⚠️ The collector needs Full Disk Access (System Settings → Privacy & Security) to read `chat.db`.

	## Autonomous & on a phone

	There's a single backend endpoint — `POST /agent` (bearer `INGEST_TOKEN`) — that takes a thread
	(or messages, + optional screenshot/`.ics`) and returns the extracted events, conflicts, and reply as
	JSON (optionally an `.ics` or a Google Calendar push). Every front-end calls it:

	- Fully autonomous (Mac) — set-and-forget: `INGEST_TOKEN=… MODEL_GGUF=~/models/hermes.gguf
	scripts/setup_mac.sh` installs three launchd jobs (Hermes `llama-server` + autonomous backend +
	collector). New iMessages you send or accept become calendar events automatically, deduped per
	chat. Triggers on outgoing messages by default (`TRIGGER_ON=outgoing`; `any` to widen).
	- Hermes "grows-with-you" brain: point `INFERENCE_BASE_URL` at a Hermes `llama-server`; its
	personal memory (people→roles, "you decline Mondays") improves extraction over time and is shown
	in the dashboard Memory tab. See [docs/hermes.md](./docs/hermes.md).
	- iPhone, one tap: an iOS Shortcut shares a thread/screenshot to `/agent` and adds the events
	to Apple Calendar natively — no `.ics` import.
	- Android, hands-off: a Tasker/MacroDroid rule on a notification/SMS calls `/agent` and inserts
	events. See [docs/android-tasker.md](./docs/android-tasker.md).
	- On-device model: set `INFERENCE_BASE_URL` to a local `llama-server` (e.g. Gemma E4B or a
	small Hermes in Termux) so inference runs on the phone — same agent, env-selected.

	> iOS can't read iMessage in the background (no message API), so fully-autonomous iMessage needs
	> the Mac collector; the iPhone path is one-gesture. See [docs/automations.md](./docs/automations.md)
	> and [docs/on-device.md](./docs/on-device.md).

	## Build Small — prizes & quests

	Track: 🏡 Backyard AI (`track:backyard`) — a practical app for a specific real person: a busy
	parent whose family calendar is buried in a noisy class group chat.

	### Sponsor awards we compete for

	\| Award \| Why this submission qualifies \|
	\|---\|---\|
	\| 🟢 Modal Awards (best Modal-powered apps) \| Modal powered the development of the platform's model end-to-end — required note, gladly given: [`training/modal_train.py`](./training/modal_train.py) (QLoRA fine-tune on serverless A100/H100s, Volumes caching weights), [`training/modal_eval.py`](./training/modal_eval.py) + [`modal_quant_eval.py`](./training/modal_quant_eval.py) (the task eval served on llama.cpp inside Modal, incl. an f16/Q8_0/Q4_K_M quantization study and the regex/text/vision A/B harness), and [`training/gated_retrain.py`](./training/gated_retrain.py) (train → staging → eval → promote only past the gate — eight regressed models rejected, every run a Modal job). \|
	\| 🌱 OpenBMB Awards (standout MiniCPM builds, per track) \| The agent is planned by OpenBMB MiniCPM (`openbmb/MiniCPM4.1-8B-GGUF`, Q4; the 1B variant is a config switch) on a second local llama-server, driving this Space's own MCP tools (`extract_events → check_conflicts → make_ics`) as a visible multi-step agent ([`server/orchestrator.py`](./server/orchestrator.py)). MiniCPM is the agent's brain, not a garnish. \|

	*(Not claimed: the OpenAI Track — no Codex-attributed commits — and the NVIDIA Nemotron Quest —
	different model family. We'd rather be honest than eligible.)*

	### Special awards — our case

	\| Award \| Our case \|
	\|---\|---\|
	\| 🎖️ Bonus Quest Champion \| All six collectable quests claimed with evidence — the full sash (table below). \|
	\| 🎨 Off-Brand Award \| Custom landing page, hero + carousel, grouped nav, bespoke results cards and Activity dashboard — [`ui/blocks.py`](./ui/blocks.py) + [`static/app.css`](./static/app.css), far past the stock Gradio look. \|
	\| 🐜 Tiny Titan \| The platform's one and only model is *Gemma E4B — ~4B effective* parameters** (~5 GB at Q4, serves on a 16 GB T4 or a laptop), and a 1B MiniCPM planner variant is a config switch. Honest framing: E4B is a MatFormer "effective-4B" — judges' call whether that's tiny enough. \|
	\| 🎬 Best Demo \| App + demo video + social post as one package — storyboard with every quest named on-camera in [`docs/demo-script.md`](./docs/demo-script.md). \|
	\| 🤖 Best Agent \| The MiniCPM-planned, MCP-tool-driven agent above — real multi-step tool use, every model under the 32B cap. \|
	\| 🃏 Judges' Wildcard \| No entry needed — but if "eval-gated fine-tuning with a public failure post-mortem" fits no category, we know where to find you. \|

	### Collectable quests — all six claimed

	\| Quest \| Evidence \|
	\|---\|---\|
	\| 🔌 Off the Grid (local-first, no cloud APIs) \| All inference is llama.cpp inside the Space; the only optional outbound call is the user's own Google Calendar push. \|
	\| 🎯 Well-Tuned (published fine-tune) \| [`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf) — our QLoRA fine-tune is the model production serves, shipped through the eval gate with the [honest scorecard public](./docs/eval-roadmap.md). \|
	\| 🎨 Off-Brand (custom UI) \| See the Off-Brand Award case above. \|
	\| 🦙 Llama Champion (llama.cpp runtime) \| The official `ghcr.io/ggml-org/llama.cpp` server image runs the GGUF + vision mmproj ([`Dockerfile`](./Dockerfile), [`scripts/start_space.sh`](./scripts/start_space.sh)). \|
	\| 📡 Sharing is Caring (open trace on the Hub) \| Redacted agent traces published to [`ParetoOptimal/offgridschedula-traces`](https://huggingface.co/datasets/ParetoOptimal/offgridschedula-traces) — one click from the Activity tab. \|
	\| 📓 Field Notes (write-up) \| [`FIELD_NOTES.md`](./FIELD_NOTES.md) + the [eval-gated fine-tuning post-mortem](./docs/blog-eval-gated-finetuning.md) + [project blog](https://huggingface.co/blog/build-small-hackathon/offgridschedula). \|

	## Fine-tune on Modal (GPU)

	`training/modal_train.py` runs the whole fine-tune on a serverless GPU and publishes the GGUF to
	HF — no local GPU needed. It's a thin wrapper that ships this repo to Modal and runs the existing
	pipeline (`make_dataset.py` → `train_qlora.py` → `export_gguf.sh`) on an A100/H100, then uploads the
	quantized GGUF + `mmproj` to your HF repo. This is all offline prep, so Off the Grid is
	untouched (the rule applies to the running app's inference, not dataset/training prep).

	```bash
	pip install modal
	modal token new
	modal secret create huggingface HF_TOKEN=hf_xxxxxxxx # your HF write token

	# Validate the full pipeline cheaply first (cheap edge model, ~a couple $):
	modal run training/modal_train.py --base-model google/gemma-4-E4B-it

	# Then the real run (default A100-80GB; --gpu H100 for speed):
	modal run training/modal_train.py
	modal run training/modal_train.py --gpu H100 --num-epochs 3
	```

	On finish it prints the `MODEL_REPO` / `MODEL_FILE` / `MMPROJ_FILE` to set on the Space. Two
	persistent Modal Volumes cache the base-model download and the outputs across runs, so iterating on
	`training/data/dataset.jsonl` only re-pays for the training itself.

	> Cost (A100-80GB ≈ $2.5/hr, per-second billing): a few-hundred-to-2000-example QLoRA run is
	> ~1–3 hr ≈ $5–15, so ~$250 of credit ≈ 15–40 full iterations. Expand the dataset before the
	> first real 31B run — the seeds in `make_dataset.py` are a smoke test, not a training set.

	### Publish your fine-tune & point the Space at it

	The training run is the one step that spends your GPU/Modal credits — it's not done for you.
	Once you've run it, the path is turnkey:

	1. Recommended: `python training/gated_retrain.py` — train → staging upload → 60-example eval →
	promote only if it beats the gate. A regressed model cannot reach production. (Raw
	`modal run training/modal_train.py` is the ungated equivalent for experiments.)
	2. Point the Space at your model via Space variables (`scripts/start_space.sh` reads them at
	launch; set in Settings → Variables or with `HfApi().add_space_variable`):
	```
	MODEL_HF_REPO = <you>/gemma-cal-gguf
	MODEL_FILE = gemma-cal-e4b-Q4_K_M.gguf # explicit file — repo may hold several quants/tiers
	MMPROJ_REPO = unsloth/gemma-4-E4B-it-GGUF # projector repo, if different from the LLM's
	MMPROJ_FILE = mmproj-F16.gguf # enables screenshot/vision input
	```
	The deploy workflow stays a plain git mirror — the model is pulled at runtime, never committed.
	3. Push to `main` → CI deploys → the Space now serves your fine-tune (Well-Tuned).

	## Share a trace (Sharing is Caring)

	Want others to learn from a run? In the Activity tab, click ⬇ Download trace (JSON) — the
	trace stays on your device, and the hosted Space holds no Hub token. Personal data is redacted by
	default (the activity log only carries counts + status; the one chat-name field is stripped). Then
	publish it from your own machine, with your own login:

	```bash
	huggingface-cli login # or export HF_TOKEN=...
	python training/share_trace.py trace.json --public # -> a HF dataset repo of traces
	```

	## Field notes

	[FIELD_NOTES.md](./FIELD_NOTES.md) is the build retrospective — the iOS→`chat.db` pivot, the
	`attributedBody` trap, why conflict math is deterministic, stub-first architecture, the
	reframe-around-one-person lesson, and the Off-the-Grid trade-offs.

	## Remote automation (runs without an interactive session)

	\| Workflow \| Trigger \| What it does \| Needs \|
	\|---\|---\|---\|---\|
	\| `.github/workflows/ci.yml` → test \| push / PR \| compile + `pytest` (stub mode, no GPU) \| nothing \|
	\| `.github/workflows/ci.yml` → deploy \| push to `main`, after tests pass \| `huggingface-cli upload` the repo to the HF Space (Gradio SDK; model excluded, pulled at runtime) \| secret `HF_TOKEN`, var `SPACE_ID` \|
	\| `.github/workflows/maintenance.yml` \| daily + manual \| ping the Space `/health`, audit outdated deps → open/update a GitHub issue \| var `SPACE_HEALTH_URL` \|

	One-time setup for deploy + monitoring:

	```bash
	gh secret set HF_TOKEN # HF write token
	gh variable set SPACE_ID -b "<owner>/<space>"
	gh variable set SPACE_HEALTH_URL -b "https://<owner>-<space>.hf.space/health"
	```

	CI installs `requirements-ci.txt` (excludes `llama-cpp-python` and the Google libs — both are
	imported lazily and not needed for the stub-mode tests). A weekly Claude `/schedule` routine handles
	the judgment work (grow `training/data/dataset.jsonl` → PR, triage CI failures).