---
title: OffGridSchedula
emoji: 🗓️
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: Local-first chat-to-calendar agent (Gemma-4 E4B + MiniCPM)
tags:
  - track:backyard
  - sponsor:openbmb
  - sponsor:modal
  - achievement:offgrid
  - achievement:welltuned
  - achievement:offbrand
  - achievement:llama
  - achievement:sharing
  - achievement:fieldnotes
models:
  - build-small-hackathon/gemma-4-cal-gguf
  - openbmb/MiniCPM5-1B-GGUF
demo_video:
  - https://youtu.be/m-o0u9X3tI4
social_posts:
  - https://x.com/nate_mauer/status/2065973341651882386
  - https://x.com/nate_mauer/status/2064920352845709419
  - https://x.com/nate_mauer/status/2065661878441750916
  - https://www.linkedin.com/feed/update/urn:li:ugcPost:7471440639969132545
blog_post:
  - https://huggingface.co/blog/build-small-hackathon/offgridschedula
made_by:
  - ParetoOptimal - a.k.a., Nate Mauer
---

# 🗓️ Message Scheduling Agent

 **OffGridSchedula turns a pasted chat (or a flyer screenshot) into calendar events, catches conflicts, and drafts the reply — right from your phone, no app, no account,
no setup. iOS allows neither background iMessage access nor a persistent on-device LLM server, so there's no autonomous on-device agent to install; instead, 
a foreground Shortcut ([docs/automations.md](./docs/automations.md)) hands a thread or screenshot to the agent in two taps (optionally using a remote model via `INFERENCE_BASE_URL`).**

The model runs on **your own server or even on the phone itself** and not on a cloud AI service. Your chats aren't shipped off to a third-party AI to be read; agent reads your snippet in memory and
discards it after replying. The run trace you can optionally share is a redacted, sent to the agent you control that turns it into ready-to-add calendar events.

**Hardware-aware.** With under-powered hardware, the app warns users with an upgrade banner rather than hanging, the real model needs a tiny GPU.

## Build Small submission — the idea & the tech

**The idea.** A busy parent's calendar lives in other people's messages — picture day in the
class chat, the practice that moved, the party flyer. OffGridSchedula turns those into calendar
events: paste the chat (or snap the flyer) from a phone browser, review the extracted events, the
conflicts against your own `.ics`, and a drafted reply — then add to Apple/Google Calendar in a tap.

**The tech.** Two small local models do the work. Extraction is [`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf)
(~4B effective params), our QLoRA fine-tune of Gemma-4 E4B that emits a single validated
**ActionPlan** (events · conflicts · reply · clarifying question), served with **vision** through
the official **llama.cpp** server inside this Docker Gradio Space — no cloud AI APIs. The
fine-tune + its 60-example task eval ran entirely on **Modal** serverless GPUs, behind an
eval gate that rejected eight regressed models before this one shipped. Conflict math is
deterministic Python, the UI is fully custom, the agent doubles as an **MCP tool server**, and
redacted run traces are public on the [Hub](https://huggingface.co/datasets/ParetoOptimal/offgridschedula-traces).
Click **Run the agents** and a local **OpenBMB MiniCPM** planner (a second local llama-server)
drives this same Space's MCP tools as a multi-step agent — extract → check conflicts → render
`.ics` — with every step visible. Still zero cloud AI; every model under 32B.

**What's new.** Extraction now reads the *logistics*, not just the date (see below): arrival-aware
start times, duration→end conversion, type-based reminders, and calendar-ready titles — each
guaranteed by deterministic post-processing even when the model wobbles, and each shipped through
a measured A/B eval ([full result tables](./training/data/ab_results.md): regex vs text-LLM vs
**vision-LLM reading rendered screenshots only**). Calendar out got one-click too: a unified
**Connect your calendar** block (Google OAuth — the token lives in *your* browser, never on the
server; Outlook/Apple need no sign-in) and per-event **Google · Outlook · iCal** links, with the
Google push verified end-to-end (push → readback → delete, 11/11).
**The UX.** One decision — **Offline or Online** — re-themes the whole workflow card and sets the
path: off-grid `.ics` only, or a **one-click "Connect your calendar"** whose Google OAuth token
lives *only in the browser* (server-verified each visit; the client secret never leaves the
server). Results land in a single card: events, conflicts, the drafted reply, and per-event
**Google · Outlook · iCal · .ics** quick-add links. **Activity → This week** tallies events
captured, conflicts caught, and time saved; a per-device **Memory** (localStorage, one-click
samples) feeds names and preferences back into extraction.

**Submission links:** [requirement-by-requirement mapping](./docs/build-small-submission.md) ·
[demo video](https://youtu.be/m-o0u9X3tI4) ·
social posts [1](https://x.com/nate_mauer/status/2064920352845709419) ·
[2](https://x.com/nate_mauer/status/2065661878441750916)

## Who this is for

A busy parent whose kid's school and activity events are buried in a noisy class group chat —
picture day Thursday, the practice that moved to Tuesday, the birthday-party RSVP. They read it once,
mean to add it later, and miss it. With this, they **paste the chat** (or a **screenshot** of a flyer
or invite) from their phone's browser and get back: the events, a **conflict check** against their
calendar, and a **ready-to-send reply** — all surfaced for review before anything is saved. Output is
a local `.ics` they can add to any calendar, with optional Google Calendar push.

No app to install and no account. It reads nothing automatically — the parent pastes only what they
choose. Inference runs **in the Space** via `llama.cpp` (no cloud AI APIs), and works out of the box
with no GPU (see *Accuracy upgrade* below).

## The model: `gemma-cal` E4B — one calendar-native LLM, built for exactly this

What makes this platform different isn't a prompt wrapped around a generic chatbot — it's
**[`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf), our own fine-tune of
Gemma-4 E4B purpose-built for one job: turning messy human conversation into calendar-ready
structure.** The model doesn't chat. It reads a thread (or a flyer photo) and emits a single
validated **ActionPlan** — events with exact ISO datetimes, conflicts, proposed alternatives, a
drafted reply, and a clarifying question when the plan is too vague to schedule. **It is the one
and only model the platform runs**, everywhere from the production Space to a laptop.

- **Edge-sized by design.** ~5 GB at Q4 — serves on a **~$0.40/hr 16 GB T4** (vs $4+/hr A100-class
  for big models), a gaming GPU, or an Apple-silicon laptop, with full **vision**
  (screenshots/flyers) via its mmproj. Local-first isn't a tagline; it's the parameter count.
- **Schema-bulletproof.** The fine-tune holds **100% schema validity even with no system prompt**,
  with stronger no-event discipline (doesn't invent events from "thanks!") and a higher rate of
  *asking* when a date is TBD — the failure modes that actually burn users of generic models.
- **Convention-trained.** It learns *this product's* date semantics ("next Tuesday" means next
  week's Tuesday; weekday-anchored relative dates) instead of whatever a base model absorbed
  from the internet.
- **Eval-gated, never vibes-shipped.** Every retrain runs a 60-example task eval (start-exact
  datetime matching, F1, validity, clarification) and **cannot reach production unless it clears
  the gate** — the pipeline has rejected eight regressed models to date. The full, honest scorecard
  lives in [`docs/eval-roadmap.md`](./docs/eval-roadmap.md) and the
  [post-mortem write-up](./docs/blog-eval-gated-finetuning.md).

**Hackathon size constraint (≤ 32B):** easily — E4B is ~4B effective parameters. See the in-app
**🏆 Submission** tab for the full compliance scorecard.

### Reads the logistics, not just the date

A confirmation like *"Time: 10:30 AM · Duration: approx. 30–45 min · (Please arrive 15 minutes
early to complete intake forms) · 📍 112A West 72nd Street…"* becomes one correct event:

- **Arrival-aware start** — the event starts at **10:15** (when you must show up), the official
  10:30 is preserved in the notes, and the **end is anchored to the stated time + duration**
  (11:00), so the calendar block covers the forms *and* the visit.
- **Type-based notifications** — an explicitly stated lead time always wins ("remind me 2 hours
  before" → 120); otherwise doctor/medical visits get 60 minutes, parties 30, carpools and school
  events 45.
- **Real-world addresses** — multi-line and 📍-emoji locations join into one string;
  "(Upper West Side — 72nd & Columbus)" glosses and SMS footers ("Reply C to confirm… call us
  at 212-223-0349") don't confuse it.
- **Calendar-ready titles** — an action+subject summary ("Pick up Priya — Terminal 4"), not a
  quote of the message.

The model is *taught* these conventions (prompt + fine-tune data), but the load-bearing ones are
also **guaranteed by deterministic post-processing** (`apply_text_rules` in
[`server/agent.py`](./server/agent.py)) — same philosophy as the conflict engine: must-hold
logistics are never left to model temperament. Every behavior above shipped through a measured
A/B eval — regex baseline vs text-LLM vs **vision-LLM reading rendered chat screenshots only** —
with the full tables in [`training/data/ab_results.md`](./training/data/ab_results.md)
(headline: text-LLM event F1 0.96 structured / 0.89 unstructured vs regex 0.60/0.67; the
screenshot-only vision arm lands within a point of text).

## Try it in 30 seconds

Open the Space in your phone's browser → **Schedule** tab → tap **Try a sample** (or paste your own
group chat, and optionally a screenshot or your `.ics`) → review the detected events → **Download
.ics**. The **Activity → This week** panel then shows what you've captured and the time it saved.

## How it works

```
Paste a thread / screenshot ──▶ HF Space ──▶ llama.cpp ──▶ events + conflicts + reply
   (phone browser)                  │                              │
                              custom Gradio UI ◀── review ──┐  ┌────┘
                                                            ▼  ▼
                                          .ics download / optional Google Calendar
```

The **primary path needs nothing but a browser**: paste text and/or attach a screenshot in the
Schedule tab. (Power users can also auto-feed messages from a Mac — see *Optional: Mac collector*.)

For the full solution-architecture view — every workflow and which LLM (if any) it calls,
plus the eval-gated fine-tuning loop — see **[docs/architecture.md](./docs/architecture.md)**.

## Can it process multiple invites at once?

**Yes — multiple invites in one paste is the designed path** (on the live Space, where the real
model runs). `ActionPlan.events` is a *list*, and the extraction prompt explicitly tells the model
that one thread often holds several events — a drop-off AND a pickup, or two appointments, are
separate events (`server/agent.py`). Everything downstream is built for N events: the results card
shows "*N events found*" with one card per invite, the editable table gets one row each, the `.ics`
contains one `VEVENT` per event, each event carries its own Google/Outlook/Apple quick-add links,
and the conflict check runs across all of them. Screenshot input is multi-file too — attach several
flyers and they're all read in one run.

Two caveats:

- **Stub mode extracts only the first invite.** The local-dev heuristic (`_stub_plan` in
  `server/agent.py`, enabled by `USE_STUB_EXTRACTOR=1`) works with no model and no GPU — and it's
  now a decent parser in its own right (labeled times, explicit dates, multi-line/📍 locations,
  durations, arrival-early shifts, type-based reminders) — but it still returns at most **one**
  event. If you paste a multi-invite thread locally and get one event back, that's the stub, not
  the product; the deployed Space uses the multi-event model path.
- **Simultaneous *runs* are serialized, not parallel.** If two users (or two tabs) hit *Run the
  agents* at once, both complete, but inference executes one request at a time — `server/model.py`
  holds the llama.cpp instance behind a `threading.Lock`, and Gradio queues the events. On a
  single-GPU Space that's intentional (one model copy in memory); the second run simply waits its
  turn, then streams its own pipeline progress.

## Repo layout

```
app.py                 # Gradio + FastAPI entrypoint (the Space)
server/
  agent.py             # thread (+images) -> validated ActionPlan
  orchestrator.py      # Run the agents: MiniCPM planner driving our own MCP tools
  schema.py            # Event / Conflict / ActionPlan pydantic models
  model.py             # llama.cpp load: GGUF + vision mmproj, constrained JSON
  imageutil.py         # image -> base64 data URI
ui/blocks.py           # custom Gradio Blocks (reasoning, events, conflicts, reply)
static/app.css         # custom CSS (Off-Brand)
calendar_out/
  ics.py               # .ics generation (off-grid default)
  freebusy.py          # parse existing .ics + deterministic conflict detection
  gcal.py              # optional Google Calendar push
collector/collector.py # Mac-side iMessage collector (text + image attachments)
training/              # dataset build + QLoRA fine-tune + GGUF/mmproj export
Dockerfile             # dedicated-GPU Space: builds llama.cpp (0.3.28) WITH CUDA
requirements-docker.txt # runtime deps for the Docker image (llama.cpp built separately)
PLAN.md                # full design + build plan
```

## Quick start (local dev) — no GPU needed

```bash
pip install -r requirements.txt

# Runs the whole app with the built-in heuristic agent — no model, no GPU:
export USE_STUB_EXTRACTOR=1 INGEST_TOKEN="dev-secret"
python app.py            # http://localhost:7860
```

Open it, go to the **Schedule** tab, and tap **Try a sample** — or paste a thread, attach chat
**screenshots**, and optionally upload your current calendar **`.ics`** for conflict checks.
(Heads-up: the stub agent extracts only the **first** invite in a thread — multi-invite extraction
needs the real model; see *Can it process multiple invites at once?* above.) Tip for
self-hosted installs: set `CAL_ICS_PATH=/path/to/calendar.ics` and conflict checks use that file
automatically whenever no `.ics` is uploaded — step 4 completes itself, fully offline. Review
the detected events, conflicts, proposed times, and the suggested reply, then add any event with
its **Add to: Google · Outlook · iCal · .ics** links (iCal and .ics both download the event's
`.ics` file; with 2+ events an **iCal — all N events** link grabs everything at once).
The **Activity → This week** panel shows what you've captured.

## This week (impact)

The Activity tab has a **This week** panel that persists across restarts: **events captured**,
**conflicts caught**, and **estimated time saved**. A "capture" is counted when a run surfaces
events for review (adding to a calendar happens through the per-event links, which the server
can't observe).

`minutes_saved` is a deliberately conservative, **configurable estimate — not a measurement**:
`IMPACT_MIN_PER_EVENT` (default **8** min per captured event) + `IMPACT_MIN_PER_CONFLICT` (default
**15** min per conflict caught). Override either via env. State persists to `IMPACT_PATH`
(default `/tmp/impact_weeks.json`; point it at a persistent disk on a Space to survive rebuilds).

## Accuracy upgrade (optional) — serve the real `gemma-cal` LLM

The stub agent above makes the demo work with **no GPU**. The production Space serves our
fine-tuned **`gemma-cal` E4B** through `llama-server` — no cloud AI APIs either way. The same
config works anywhere llama.cpp runs:

```bash
export USE_STUB_EXTRACTOR=0
export MODEL_HF_REPO="build-small-hackathon/gemma-4-cal-gguf"
export MODEL_FILE="gemma-cal-e4b-Q4_K_M.gguf"     # ~5 GB edge fine-tune (what the Space serves)
export MMPROJ_REPO="unsloth/gemma-4-E4B-it-GGUF"  # the E4B's own vision projector
export MMPROJ_FILE="mmproj-F16.gguf"              # enables screenshot/vision input
bash scripts/start_space.sh
```

This is the platform's **only** model — the same ~5 GB GGUF serves the production Space (16 GB
T4), a gaming GPU, or a laptop. (`MODEL_FILE` is explicit on purpose: the model repo also stores
legacy training artifacts, so the `-hf repo:Q4_K_M` shorthand is ambiguous.)

## Optional: Mac collector (power users)

The phone-paste path above needs nothing installed. If you'd rather have new iMessages fed in
automatically, run the collector on a Mac where iMessages sync (iOS exposes no API for message
content, so a Mac is the only auto-feed source):

```bash
cd collector && cp .env.example .env   # edit SPACE_URL + INGEST_TOKEN
python collector.py
```

> ⚠️ The collector needs **Full Disk Access** (System Settings → Privacy & Security) to read `chat.db`.

## Autonomous & on a phone

There's a single backend endpoint — **`POST /agent`** (bearer `INGEST_TOKEN`) — that takes a thread
(or messages, + optional screenshot/`.ics`) and returns the extracted events, conflicts, and reply as
JSON (optionally an `.ics` or a Google Calendar push). Every front-end calls it:

- **Fully autonomous (Mac) — set-and-forget:** `INGEST_TOKEN=… MODEL_GGUF=~/models/hermes.gguf
  scripts/setup_mac.sh` installs three launchd jobs (Hermes `llama-server` + autonomous backend +
  collector). New iMessages **you send or accept** become calendar events automatically, deduped per
  chat. Triggers on outgoing messages by default (`TRIGGER_ON=outgoing`; `any` to widen).
- **Hermes "grows-with-you" brain:** point `INFERENCE_BASE_URL` at a Hermes `llama-server`; its
  personal **memory** (people→roles, "you decline Mondays") improves extraction over time and is shown
  in the dashboard **Memory** tab. See **[docs/hermes.md](./docs/hermes.md)**.
- **iPhone, one tap:** an iOS **Shortcut** shares a thread/screenshot to `/agent` and adds the events
  to Apple Calendar natively — no `.ics` import.
- **Android, hands-off:** a Tasker/MacroDroid rule on a notification/SMS calls `/agent` and inserts
  events. See **[docs/android-tasker.md](./docs/android-tasker.md)**.
- **On-device model:** set `INFERENCE_BASE_URL` to a local `llama-server` (e.g. Gemma **E4B** or a
  small Hermes in Termux) so inference runs *on the phone* — same agent, env-selected.

> **iOS can't read iMessage in the background** (no message API), so fully-autonomous iMessage needs
> the Mac collector; the iPhone path is one-gesture. See **[docs/automations.md](./docs/automations.md)**
> and **[docs/on-device.md](./docs/on-device.md)**.

## Build Small — prizes & quests

**Track: 🏡 Backyard AI** (`track:backyard`) — a practical app for a specific real person: a busy
parent whose family calendar is buried in a noisy class group chat.

### Sponsor awards we compete for

| Award | Why this submission qualifies |
|---|---|
| 🟢 **Modal Awards** (best Modal-powered apps) | **Modal powered the development of the platform's model end-to-end** — required note, gladly given: [`training/modal_train.py`](./training/modal_train.py) (QLoRA fine-tune on serverless A100/H100s, Volumes caching weights), [`training/modal_eval.py`](./training/modal_eval.py) + [`modal_quant_eval.py`](./training/modal_quant_eval.py) (the task eval served on llama.cpp inside Modal, incl. an f16/Q8_0/Q4_K_M quantization study and the regex/text/vision A/B harness), and [`training/gated_retrain.py`](./training/gated_retrain.py) (train → staging → eval → promote *only past the gate* — eight regressed models rejected, every run a Modal job). |
| 🌱 **OpenBMB Awards** (standout MiniCPM builds, per track) | The **agent is planned by OpenBMB MiniCPM** (`openbmb/MiniCPM4.1-8B-GGUF`, Q4; the 1B variant is a config switch) on a second local llama-server, driving this Space's own MCP tools (`extract_events → check_conflicts → make_ics`) as a visible multi-step agent ([`server/orchestrator.py`](./server/orchestrator.py)). MiniCPM is the agent's brain, not a garnish. |

*(Not claimed: the OpenAI Track — no Codex-attributed commits — and the NVIDIA Nemotron Quest —
different model family. We'd rather be honest than eligible.)*

### Special awards — our case

| Award | Our case |
|---|---|
| 🎖️ **Bonus Quest Champion** | All **six** collectable quests claimed with evidence — the full sash (table below). |
| 🎨 **Off-Brand Award** | Custom landing page, hero + carousel, grouped nav, bespoke results cards and Activity dashboard — [`ui/blocks.py`](./ui/blocks.py) + [`static/app.css`](./static/app.css), far past the stock Gradio look. |
| 🐜 **Tiny Titan** | The platform's one and only model is **Gemma E4B — ~4B *effective* parameters** (~5 GB at Q4, serves on a 16 GB T4 or a laptop), and a 1B MiniCPM planner variant is a config switch. Honest framing: E4B is a MatFormer "effective-4B" — judges' call whether that's tiny enough. |
| 🎬 **Best Demo** | App + demo video + social post as one package — storyboard with every quest named on-camera in [`docs/demo-script.md`](./docs/demo-script.md). |
| 🤖 **Best Agent** | The MiniCPM-planned, MCP-tool-driven agent above — real multi-step tool use, every model under the 32B cap. |
| 🃏 **Judges' Wildcard** | No entry needed — but if "eval-gated fine-tuning with a public failure post-mortem" fits no category, we know where to find you. |

### Collectable quests — all six claimed

| Quest | Evidence |
|---|---|
| 🔌 **Off the Grid** (local-first, no cloud APIs) | All inference is llama.cpp inside the Space; the only optional outbound call is the user's own Google Calendar push. |
| 🎯 **Well-Tuned** (published fine-tune) | [`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf) — our QLoRA fine-tune **is the model production serves**, shipped through the eval gate with the [honest scorecard public](./docs/eval-roadmap.md). |
| 🎨 **Off-Brand** (custom UI) | See the Off-Brand Award case above. |
| 🦙 **Llama Champion** (llama.cpp runtime) | The official `ghcr.io/ggml-org/llama.cpp` server image runs the GGUF + vision mmproj ([`Dockerfile`](./Dockerfile), [`scripts/start_space.sh`](./scripts/start_space.sh)). |
| 📡 **Sharing is Caring** (open trace on the Hub) | Redacted agent traces published to [`ParetoOptimal/offgridschedula-traces`](https://huggingface.co/datasets/ParetoOptimal/offgridschedula-traces) — one click from the Activity tab. |
| 📓 **Field Notes** (write-up) | [`FIELD_NOTES.md`](./FIELD_NOTES.md) + the [eval-gated fine-tuning post-mortem](./docs/blog-eval-gated-finetuning.md) + [project blog](https://huggingface.co/blog/build-small-hackathon/offgridschedula). |

## Fine-tune on Modal (GPU)

`training/modal_train.py` runs the whole fine-tune on a serverless GPU and publishes the GGUF to
HF — no local GPU needed. It's a thin wrapper that ships this repo to Modal and runs the existing
pipeline (`make_dataset.py` → `train_qlora.py` → `export_gguf.sh`) on an A100/H100, then uploads the
quantized GGUF + `mmproj` to your HF repo. This is all *offline* prep, so **Off the Grid** is
untouched (the rule applies to the running app's inference, not dataset/training prep).

```bash
pip install modal
modal token new
modal secret create huggingface HF_TOKEN=hf_xxxxxxxx     # your HF *write* token

# Validate the full pipeline cheaply first (cheap edge model, ~a couple $):
modal run training/modal_train.py --base-model google/gemma-4-E4B-it

# Then the real run (default A100-80GB; --gpu H100 for speed):
modal run training/modal_train.py
modal run training/modal_train.py --gpu H100 --num-epochs 3
```

On finish it prints the `MODEL_REPO` / `MODEL_FILE` / `MMPROJ_FILE` to set on the Space. Two
persistent Modal Volumes cache the base-model download and the outputs across runs, so iterating on
`training/data/dataset.jsonl` only re-pays for the training itself.

> Cost (A100-80GB ≈ $2.5/hr, per-second billing): a few-hundred-to-2000-example QLoRA run is
> ~1–3 hr ≈ $5–15, so ~$250 of credit ≈ 15–40 full iterations. Expand the dataset before the
> first real 31B run — the seeds in `make_dataset.py` are a smoke test, not a training set.

### Publish your fine-tune & point the Space at it

The training run is the one step that spends **your** GPU/Modal credits — it's not done for you.
Once you've run it, the path is turnkey:

1. **Recommended:** `python training/gated_retrain.py` — train → staging upload → 60-example eval →
   **promote only if it beats the gate**. A regressed model cannot reach production. (Raw
   `modal run training/modal_train.py` is the ungated equivalent for experiments.)
2. Point the Space at *your* model via **Space variables** (`scripts/start_space.sh` reads them at
   launch; set in *Settings → Variables* or with `HfApi().add_space_variable`):
   ```
   MODEL_HF_REPO = <you>/gemma-cal-gguf
   MODEL_FILE    = gemma-cal-e4b-Q4_K_M.gguf   # explicit file — repo may hold several quants/tiers
   MMPROJ_REPO   = unsloth/gemma-4-E4B-it-GGUF # projector repo, if different from the LLM's
   MMPROJ_FILE   = mmproj-F16.gguf             # enables screenshot/vision input
   ```
   The deploy workflow stays a plain git mirror — the model is pulled at runtime, never committed.
3. Push to `main` → CI deploys → the Space now serves your fine-tune (**Well-Tuned**).

## Share a trace (Sharing is Caring)

Want others to learn from a run? In the **Activity** tab, click **⬇ Download trace (JSON)** — the
trace stays on your device, and the hosted Space holds **no Hub token**. Personal data is redacted by
default (the activity log only carries counts + status; the one chat-name field is stripped). Then
publish it from your own machine, with your own login:

```bash
huggingface-cli login                                   # or export HF_TOKEN=...
python training/share_trace.py trace.json --public      # -> a HF dataset repo of traces
```

## Field notes

[**FIELD_NOTES.md**](./FIELD_NOTES.md) is the build retrospective — the iOS→`chat.db` pivot, the
`attributedBody` trap, why conflict math is deterministic, stub-first architecture, the
reframe-around-one-person lesson, and the Off-the-Grid trade-offs.

## Remote automation (runs without an interactive session)

| Workflow | Trigger | What it does | Needs |
|---|---|---|---|
| `.github/workflows/ci.yml` → **test** | push / PR | compile + `pytest` (stub mode, no GPU) | nothing |
| `.github/workflows/ci.yml` → **deploy** | push to `main`, after tests pass | `huggingface-cli upload` the repo to the HF Space (Gradio SDK; model excluded, pulled at runtime) | secret `HF_TOKEN`, var `SPACE_ID` |
| `.github/workflows/maintenance.yml` | daily + manual | ping the Space `/health`, audit outdated deps → open/update a GitHub issue | var `SPACE_HEALTH_URL` |

One-time setup for deploy + monitoring:

```bash
gh secret set HF_TOKEN                       # HF write token
gh variable set SPACE_ID -b "<owner>/<space>"
gh variable set SPACE_HEALTH_URL -b "https://<owner>-<space>.hf.space/health"
```

CI installs `requirements-ci.txt` (excludes `llama-cpp-python` and the Google libs — both are
imported lazily and not needed for the stub-mode tests). A weekly Claude `/schedule` routine handles
the judgment work (grow `training/data/dataset.jsonl` → PR, triage CI failures).