OffGridSchedula / README.md
ParetoOptimal's picture
Initial Commit
0366d65
|
Raw
History Blame Contribute Delete
27.9 kB
---
title: OffGridSchedula
emoji: πŸ—“οΈ
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: Local-first chat-to-calendar agent (Gemma-4 E4B + MiniCPM)
tags:
- track:backyard
- sponsor:openbmb
- sponsor:modal
- achievement:offgrid
- achievement:welltuned
- achievement:offbrand
- achievement:llama
- achievement:sharing
- achievement:fieldnotes
models:
- build-small-hackathon/gemma-4-cal-gguf
- openbmb/MiniCPM5-1B-GGUF
demo_video:
- https://youtu.be/m-o0u9X3tI4
social_posts:
- https://x.com/nate_mauer/status/2065973341651882386
- https://x.com/nate_mauer/status/2064920352845709419
- https://x.com/nate_mauer/status/2065661878441750916
- https://www.linkedin.com/feed/update/urn:li:ugcPost:7471440639969132545
blog_post:
- https://huggingface.co/blog/build-small-hackathon/offgridschedula
made_by:
- ParetoOptimal - a.k.a., Nate Mauer
---
# πŸ—“οΈ Message Scheduling Agent
**OffGridSchedula turns a pasted chat (or a flyer screenshot) into calendar events, catches conflicts, and drafts the reply β€” right from your phone, no app, no account,
no setup. iOS allows neither background iMessage access nor a persistent on-device LLM server, so there's no autonomous on-device agent to install; instead,
a foreground Shortcut ([docs/automations.md](./docs/automations.md)) hands a thread or screenshot to the agent in two taps (optionally using a remote model via `INFERENCE_BASE_URL`).**
The model runs on **your own server or even on the phone itself** and not on a cloud AI service. Your chats aren't shipped off to a third-party AI to be read; agent reads your snippet in memory and
discards it after replying. The run trace you can optionally share is a redacted, sent to the agent you control that turns it into ready-to-add calendar events.
**Hardware-aware.** With under-powered hardware, the app warns users with an upgrade banner rather than hanging, the real model needs a tiny GPU.
## Build Small submission β€” the idea & the tech
**The idea.** A busy parent's calendar lives in other people's messages β€” picture day in the
class chat, the practice that moved, the party flyer. OffGridSchedula turns those into calendar
events: paste the chat (or snap the flyer) from a phone browser, review the extracted events, the
conflicts against your own `.ics`, and a drafted reply β€” then add to Apple/Google Calendar in a tap.
**The tech.** Two small local models do the work. Extraction is [`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf)
(~4B effective params), our QLoRA fine-tune of Gemma-4 E4B that emits a single validated
**ActionPlan** (events Β· conflicts Β· reply Β· clarifying question), served with **vision** through
the official **llama.cpp** server inside this Docker Gradio Space β€” no cloud AI APIs. The
fine-tune + its 60-example task eval ran entirely on **Modal** serverless GPUs, behind an
eval gate that rejected eight regressed models before this one shipped. Conflict math is
deterministic Python, the UI is fully custom, the agent doubles as an **MCP tool server**, and
redacted run traces are public on the [Hub](https://huggingface.co/datasets/ParetoOptimal/offgridschedula-traces).
Click **Run the agents** and a local **OpenBMB MiniCPM** planner (a second local llama-server)
drives this same Space's MCP tools as a multi-step agent β€” extract β†’ check conflicts β†’ render
`.ics` β€” with every step visible. Still zero cloud AI; every model under 32B.
**What's new.** Extraction now reads the *logistics*, not just the date (see below): arrival-aware
start times, duration→end conversion, type-based reminders, and calendar-ready titles — each
guaranteed by deterministic post-processing even when the model wobbles, and each shipped through
a measured A/B eval ([full result tables](./training/data/ab_results.md): regex vs text-LLM vs
**vision-LLM reading rendered screenshots only**). Calendar out got one-click too: a unified
**Connect your calendar** block (Google OAuth β€” the token lives in *your* browser, never on the
server; Outlook/Apple need no sign-in) and per-event **Google Β· Outlook Β· iCal** links, with the
Google push verified end-to-end (push β†’ readback β†’ delete, 11/11).
**The UX.** One decision β€” **Offline or Online** β€” re-themes the whole workflow card and sets the
path: off-grid `.ics` only, or a **one-click "Connect your calendar"** whose Google OAuth token
lives *only in the browser* (server-verified each visit; the client secret never leaves the
server). Results land in a single card: events, conflicts, the drafted reply, and per-event
**Google Β· Outlook Β· iCal Β· .ics** quick-add links. **Activity β†’ This week** tallies events
captured, conflicts caught, and time saved; a per-device **Memory** (localStorage, one-click
samples) feeds names and preferences back into extraction.
**Submission links:** [requirement-by-requirement mapping](./docs/build-small-submission.md) Β·
[demo video](https://youtu.be/m-o0u9X3tI4) Β·
social posts [1](https://x.com/nate_mauer/status/2064920352845709419) Β·
[2](https://x.com/nate_mauer/status/2065661878441750916)
## Who this is for
A busy parent whose kid's school and activity events are buried in a noisy class group chat β€”
picture day Thursday, the practice that moved to Tuesday, the birthday-party RSVP. They read it once,
mean to add it later, and miss it. With this, they **paste the chat** (or a **screenshot** of a flyer
or invite) from their phone's browser and get back: the events, a **conflict check** against their
calendar, and a **ready-to-send reply** β€” all surfaced for review before anything is saved. Output is
a local `.ics` they can add to any calendar, with optional Google Calendar push.
No app to install and no account. It reads nothing automatically β€” the parent pastes only what they
choose. Inference runs **in the Space** via `llama.cpp` (no cloud AI APIs), and works out of the box
with no GPU (see *Accuracy upgrade* below).
## The model: `gemma-cal` E4B β€” one calendar-native LLM, built for exactly this
What makes this platform different isn't a prompt wrapped around a generic chatbot β€” it's
**[`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf), our own fine-tune of
Gemma-4 E4B purpose-built for one job: turning messy human conversation into calendar-ready
structure.** The model doesn't chat. It reads a thread (or a flyer photo) and emits a single
validated **ActionPlan** β€” events with exact ISO datetimes, conflicts, proposed alternatives, a
drafted reply, and a clarifying question when the plan is too vague to schedule. **It is the one
and only model the platform runs**, everywhere from the production Space to a laptop.
- **Edge-sized by design.** ~5 GB at Q4 β€” serves on a **~$0.40/hr 16 GB T4** (vs $4+/hr A100-class
for big models), a gaming GPU, or an Apple-silicon laptop, with full **vision**
(screenshots/flyers) via its mmproj. Local-first isn't a tagline; it's the parameter count.
- **Schema-bulletproof.** The fine-tune holds **100% schema validity even with no system prompt**,
with stronger no-event discipline (doesn't invent events from "thanks!") and a higher rate of
*asking* when a date is TBD β€” the failure modes that actually burn users of generic models.
- **Convention-trained.** It learns *this product's* date semantics ("next Tuesday" means next
week's Tuesday; weekday-anchored relative dates) instead of whatever a base model absorbed
from the internet.
- **Eval-gated, never vibes-shipped.** Every retrain runs a 60-example task eval (start-exact
datetime matching, F1, validity, clarification) and **cannot reach production unless it clears
the gate** β€” the pipeline has rejected eight regressed models to date. The full, honest scorecard
lives in [`docs/eval-roadmap.md`](./docs/eval-roadmap.md) and the
[post-mortem write-up](./docs/blog-eval-gated-finetuning.md).
**Hackathon size constraint (≀ 32B):** easily β€” E4B is ~4B effective parameters. See the in-app
**πŸ† Submission** tab for the full compliance scorecard.
### Reads the logistics, not just the date
A confirmation like *"Time: 10:30 AM Β· Duration: approx. 30–45 min Β· (Please arrive 15 minutes
early to complete intake forms) Β· πŸ“ 112A West 72nd Street…"* becomes one correct event:
- **Arrival-aware start** β€” the event starts at **10:15** (when you must show up), the official
10:30 is preserved in the notes, and the **end is anchored to the stated time + duration**
(11:00), so the calendar block covers the forms *and* the visit.
- **Type-based notifications** β€” an explicitly stated lead time always wins ("remind me 2 hours
before" β†’ 120); otherwise doctor/medical visits get 60 minutes, parties 30, carpools and school
events 45.
- **Real-world addresses** β€” multi-line and πŸ“-emoji locations join into one string;
"(Upper West Side β€” 72nd & Columbus)" glosses and SMS footers ("Reply C to confirm… call us
at 212-223-0349") don't confuse it.
- **Calendar-ready titles** β€” an action+subject summary ("Pick up Priya β€” Terminal 4"), not a
quote of the message.
The model is *taught* these conventions (prompt + fine-tune data), but the load-bearing ones are
also **guaranteed by deterministic post-processing** (`apply_text_rules` in
[`server/agent.py`](./server/agent.py)) β€” same philosophy as the conflict engine: must-hold
logistics are never left to model temperament. Every behavior above shipped through a measured
A/B eval β€” regex baseline vs text-LLM vs **vision-LLM reading rendered chat screenshots only** β€”
with the full tables in [`training/data/ab_results.md`](./training/data/ab_results.md)
(headline: text-LLM event F1 0.96 structured / 0.89 unstructured vs regex 0.60/0.67; the
screenshot-only vision arm lands within a point of text).
## Try it in 30 seconds
Open the Space in your phone's browser β†’ **Schedule** tab β†’ tap **Try a sample** (or paste your own
group chat, and optionally a screenshot or your `.ics`) β†’ review the detected events β†’ **Download
.ics**. The **Activity β†’ This week** panel then shows what you've captured and the time it saved.
## How it works
```
Paste a thread / screenshot ──▢ HF Space ──▢ llama.cpp ──▢ events + conflicts + reply
(phone browser) β”‚ β”‚
custom Gradio UI ◀── review ──┐ β”Œβ”€β”€β”€β”€β”˜
β–Ό β–Ό
.ics download / optional Google Calendar
```
The **primary path needs nothing but a browser**: paste text and/or attach a screenshot in the
Schedule tab. (Power users can also auto-feed messages from a Mac β€” see *Optional: Mac collector*.)
For the full solution-architecture view β€” every workflow and which LLM (if any) it calls,
plus the eval-gated fine-tuning loop β€” see **[docs/architecture.md](./docs/architecture.md)**.
## Can it process multiple invites at once?
**Yes β€” multiple invites in one paste is the designed path** (on the live Space, where the real
model runs). `ActionPlan.events` is a *list*, and the extraction prompt explicitly tells the model
that one thread often holds several events β€” a drop-off AND a pickup, or two appointments, are
separate events (`server/agent.py`). Everything downstream is built for N events: the results card
shows "*N events found*" with one card per invite, the editable table gets one row each, the `.ics`
contains one `VEVENT` per event, each event carries its own Google/Outlook/Apple quick-add links,
and the conflict check runs across all of them. Screenshot input is multi-file too β€” attach several
flyers and they're all read in one run.
Two caveats:
- **Stub mode extracts only the first invite.** The local-dev heuristic (`_stub_plan` in
`server/agent.py`, enabled by `USE_STUB_EXTRACTOR=1`) works with no model and no GPU β€” and it's
now a decent parser in its own right (labeled times, explicit dates, multi-line/πŸ“ locations,
durations, arrival-early shifts, type-based reminders) β€” but it still returns at most **one**
event. If you paste a multi-invite thread locally and get one event back, that's the stub, not
the product; the deployed Space uses the multi-event model path.
- **Simultaneous *runs* are serialized, not parallel.** If two users (or two tabs) hit *Run the
agents* at once, both complete, but inference executes one request at a time β€” `server/model.py`
holds the llama.cpp instance behind a `threading.Lock`, and Gradio queues the events. On a
single-GPU Space that's intentional (one model copy in memory); the second run simply waits its
turn, then streams its own pipeline progress.
## Repo layout
```
app.py # Gradio + FastAPI entrypoint (the Space)
server/
agent.py # thread (+images) -> validated ActionPlan
orchestrator.py # Run the agents: MiniCPM planner driving our own MCP tools
schema.py # Event / Conflict / ActionPlan pydantic models
model.py # llama.cpp load: GGUF + vision mmproj, constrained JSON
imageutil.py # image -> base64 data URI
ui/blocks.py # custom Gradio Blocks (reasoning, events, conflicts, reply)
static/app.css # custom CSS (Off-Brand)
calendar_out/
ics.py # .ics generation (off-grid default)
freebusy.py # parse existing .ics + deterministic conflict detection
gcal.py # optional Google Calendar push
collector/collector.py # Mac-side iMessage collector (text + image attachments)
training/ # dataset build + QLoRA fine-tune + GGUF/mmproj export
Dockerfile # dedicated-GPU Space: builds llama.cpp (0.3.28) WITH CUDA
requirements-docker.txt # runtime deps for the Docker image (llama.cpp built separately)
PLAN.md # full design + build plan
```
## Quick start (local dev) β€” no GPU needed
```bash
pip install -r requirements.txt
# Runs the whole app with the built-in heuristic agent β€” no model, no GPU:
export USE_STUB_EXTRACTOR=1 INGEST_TOKEN="dev-secret"
python app.py # http://localhost:7860
```
Open it, go to the **Schedule** tab, and tap **Try a sample** β€” or paste a thread, attach chat
**screenshots**, and optionally upload your current calendar **`.ics`** for conflict checks.
(Heads-up: the stub agent extracts only the **first** invite in a thread β€” multi-invite extraction
needs the real model; see *Can it process multiple invites at once?* above.) Tip for
self-hosted installs: set `CAL_ICS_PATH=/path/to/calendar.ics` and conflict checks use that file
automatically whenever no `.ics` is uploaded β€” step 4 completes itself, fully offline. Review
the detected events, conflicts, proposed times, and the suggested reply, then add any event with
its **Add to: Google Β· Outlook Β· iCal Β· .ics** links (iCal and .ics both download the event's
`.ics` file; with 2+ events an **iCal β€” all N events** link grabs everything at once).
The **Activity β†’ This week** panel shows what you've captured.
## This week (impact)
The Activity tab has a **This week** panel that persists across restarts: **events captured**,
**conflicts caught**, and **estimated time saved**. A "capture" is counted when a run surfaces
events for review (adding to a calendar happens through the per-event links, which the server
can't observe).
`minutes_saved` is a deliberately conservative, **configurable estimate β€” not a measurement**:
`IMPACT_MIN_PER_EVENT` (default **8** min per captured event) + `IMPACT_MIN_PER_CONFLICT` (default
**15** min per conflict caught). Override either via env. State persists to `IMPACT_PATH`
(default `/tmp/impact_weeks.json`; point it at a persistent disk on a Space to survive rebuilds).
## Accuracy upgrade (optional) β€” serve the real `gemma-cal` LLM
The stub agent above makes the demo work with **no GPU**. The production Space serves our
fine-tuned **`gemma-cal` E4B** through `llama-server` β€” no cloud AI APIs either way. The same
config works anywhere llama.cpp runs:
```bash
export USE_STUB_EXTRACTOR=0
export MODEL_HF_REPO="build-small-hackathon/gemma-4-cal-gguf"
export MODEL_FILE="gemma-cal-e4b-Q4_K_M.gguf" # ~5 GB edge fine-tune (what the Space serves)
export MMPROJ_REPO="unsloth/gemma-4-E4B-it-GGUF" # the E4B's own vision projector
export MMPROJ_FILE="mmproj-F16.gguf" # enables screenshot/vision input
bash scripts/start_space.sh
```
This is the platform's **only** model β€” the same ~5 GB GGUF serves the production Space (16 GB
T4), a gaming GPU, or a laptop. (`MODEL_FILE` is explicit on purpose: the model repo also stores
legacy training artifacts, so the `-hf repo:Q4_K_M` shorthand is ambiguous.)
## Optional: Mac collector (power users)
The phone-paste path above needs nothing installed. If you'd rather have new iMessages fed in
automatically, run the collector on a Mac where iMessages sync (iOS exposes no API for message
content, so a Mac is the only auto-feed source):
```bash
cd collector && cp .env.example .env # edit SPACE_URL + INGEST_TOKEN
python collector.py
```
> ⚠️ The collector needs **Full Disk Access** (System Settings β†’ Privacy & Security) to read `chat.db`.
## Autonomous & on a phone
There's a single backend endpoint β€” **`POST /agent`** (bearer `INGEST_TOKEN`) β€” that takes a thread
(or messages, + optional screenshot/`.ics`) and returns the extracted events, conflicts, and reply as
JSON (optionally an `.ics` or a Google Calendar push). Every front-end calls it:
- **Fully autonomous (Mac) β€” set-and-forget:** `INGEST_TOKEN=… MODEL_GGUF=~/models/hermes.gguf
scripts/setup_mac.sh` installs three launchd jobs (Hermes `llama-server` + autonomous backend +
collector). New iMessages **you send or accept** become calendar events automatically, deduped per
chat. Triggers on outgoing messages by default (`TRIGGER_ON=outgoing`; `any` to widen).
- **Hermes "grows-with-you" brain:** point `INFERENCE_BASE_URL` at a Hermes `llama-server`; its
personal **memory** (people→roles, "you decline Mondays") improves extraction over time and is shown
in the dashboard **Memory** tab. See **[docs/hermes.md](./docs/hermes.md)**.
- **iPhone, one tap:** an iOS **Shortcut** shares a thread/screenshot to `/agent` and adds the events
to Apple Calendar natively β€” no `.ics` import.
- **Android, hands-off:** a Tasker/MacroDroid rule on a notification/SMS calls `/agent` and inserts
events. See **[docs/android-tasker.md](./docs/android-tasker.md)**.
- **On-device model:** set `INFERENCE_BASE_URL` to a local `llama-server` (e.g. Gemma **E4B** or a
small Hermes in Termux) so inference runs *on the phone* β€” same agent, env-selected.
> **iOS can't read iMessage in the background** (no message API), so fully-autonomous iMessage needs
> the Mac collector; the iPhone path is one-gesture. See **[docs/automations.md](./docs/automations.md)**
> and **[docs/on-device.md](./docs/on-device.md)**.
## Build Small β€” prizes & quests
**Track: 🏑 Backyard AI** (`track:backyard`) β€” a practical app for a specific real person: a busy
parent whose family calendar is buried in a noisy class group chat.
### Sponsor awards we compete for
| Award | Why this submission qualifies |
|---|---|
| 🟒 **Modal Awards** (best Modal-powered apps) | **Modal powered the development of the platform's model end-to-end** β€” required note, gladly given: [`training/modal_train.py`](./training/modal_train.py) (QLoRA fine-tune on serverless A100/H100s, Volumes caching weights), [`training/modal_eval.py`](./training/modal_eval.py) + [`modal_quant_eval.py`](./training/modal_quant_eval.py) (the task eval served on llama.cpp inside Modal, incl. an f16/Q8_0/Q4_K_M quantization study and the regex/text/vision A/B harness), and [`training/gated_retrain.py`](./training/gated_retrain.py) (train β†’ staging β†’ eval β†’ promote *only past the gate* β€” eight regressed models rejected, every run a Modal job). |
| 🌱 **OpenBMB Awards** (standout MiniCPM builds, per track) | The **agent is planned by OpenBMB MiniCPM** (`openbmb/MiniCPM4.1-8B-GGUF`, Q4; the 1B variant is a config switch) on a second local llama-server, driving this Space's own MCP tools (`extract_events β†’ check_conflicts β†’ make_ics`) as a visible multi-step agent ([`server/orchestrator.py`](./server/orchestrator.py)). MiniCPM is the agent's brain, not a garnish. |
*(Not claimed: the OpenAI Track β€” no Codex-attributed commits β€” and the NVIDIA Nemotron Quest β€”
different model family. We'd rather be honest than eligible.)*
### Special awards β€” our case
| Award | Our case |
|---|---|
| πŸŽ–οΈ **Bonus Quest Champion** | All **six** collectable quests claimed with evidence β€” the full sash (table below). |
| 🎨 **Off-Brand Award** | Custom landing page, hero + carousel, grouped nav, bespoke results cards and Activity dashboard β€” [`ui/blocks.py`](./ui/blocks.py) + [`static/app.css`](./static/app.css), far past the stock Gradio look. |
| 🐜 **Tiny Titan** | The platform's one and only model is **Gemma E4B β€” ~4B *effective* parameters** (~5 GB at Q4, serves on a 16 GB T4 or a laptop), and a 1B MiniCPM planner variant is a config switch. Honest framing: E4B is a MatFormer "effective-4B" β€” judges' call whether that's tiny enough. |
| 🎬 **Best Demo** | App + demo video + social post as one package β€” storyboard with every quest named on-camera in [`docs/demo-script.md`](./docs/demo-script.md). |
| πŸ€– **Best Agent** | The MiniCPM-planned, MCP-tool-driven agent above β€” real multi-step tool use, every model under the 32B cap. |
| πŸƒ **Judges' Wildcard** | No entry needed β€” but if "eval-gated fine-tuning with a public failure post-mortem" fits no category, we know where to find you. |
### Collectable quests β€” all six claimed
| Quest | Evidence |
|---|---|
| πŸ”Œ **Off the Grid** (local-first, no cloud APIs) | All inference is llama.cpp inside the Space; the only optional outbound call is the user's own Google Calendar push. |
| 🎯 **Well-Tuned** (published fine-tune) | [`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf) β€” our QLoRA fine-tune **is the model production serves**, shipped through the eval gate with the [honest scorecard public](./docs/eval-roadmap.md). |
| 🎨 **Off-Brand** (custom UI) | See the Off-Brand Award case above. |
| πŸ¦™ **Llama Champion** (llama.cpp runtime) | The official `ghcr.io/ggml-org/llama.cpp` server image runs the GGUF + vision mmproj ([`Dockerfile`](./Dockerfile), [`scripts/start_space.sh`](./scripts/start_space.sh)). |
| πŸ“‘ **Sharing is Caring** (open trace on the Hub) | Redacted agent traces published to [`ParetoOptimal/offgridschedula-traces`](https://huggingface.co/datasets/ParetoOptimal/offgridschedula-traces) β€” one click from the Activity tab. |
| πŸ““ **Field Notes** (write-up) | [`FIELD_NOTES.md`](./FIELD_NOTES.md) + the [eval-gated fine-tuning post-mortem](./docs/blog-eval-gated-finetuning.md) + [project blog](https://huggingface.co/blog/build-small-hackathon/offgridschedula). |
## Fine-tune on Modal (GPU)
`training/modal_train.py` runs the whole fine-tune on a serverless GPU and publishes the GGUF to
HF β€” no local GPU needed. It's a thin wrapper that ships this repo to Modal and runs the existing
pipeline (`make_dataset.py` β†’ `train_qlora.py` β†’ `export_gguf.sh`) on an A100/H100, then uploads the
quantized GGUF + `mmproj` to your HF repo. This is all *offline* prep, so **Off the Grid** is
untouched (the rule applies to the running app's inference, not dataset/training prep).
```bash
pip install modal
modal token new
modal secret create huggingface HF_TOKEN=hf_xxxxxxxx # your HF *write* token
# Validate the full pipeline cheaply first (cheap edge model, ~a couple $):
modal run training/modal_train.py --base-model google/gemma-4-E4B-it
# Then the real run (default A100-80GB; --gpu H100 for speed):
modal run training/modal_train.py
modal run training/modal_train.py --gpu H100 --num-epochs 3
```
On finish it prints the `MODEL_REPO` / `MODEL_FILE` / `MMPROJ_FILE` to set on the Space. Two
persistent Modal Volumes cache the base-model download and the outputs across runs, so iterating on
`training/data/dataset.jsonl` only re-pays for the training itself.
> Cost (A100-80GB β‰ˆ $2.5/hr, per-second billing): a few-hundred-to-2000-example QLoRA run is
> ~1–3 hr β‰ˆ $5–15, so ~$250 of credit β‰ˆ 15–40 full iterations. Expand the dataset before the
> first real 31B run β€” the seeds in `make_dataset.py` are a smoke test, not a training set.
### Publish your fine-tune & point the Space at it
The training run is the one step that spends **your** GPU/Modal credits β€” it's not done for you.
Once you've run it, the path is turnkey:
1. **Recommended:** `python training/gated_retrain.py` β€” train β†’ staging upload β†’ 60-example eval β†’
**promote only if it beats the gate**. A regressed model cannot reach production. (Raw
`modal run training/modal_train.py` is the ungated equivalent for experiments.)
2. Point the Space at *your* model via **Space variables** (`scripts/start_space.sh` reads them at
launch; set in *Settings β†’ Variables* or with `HfApi().add_space_variable`):
```
MODEL_HF_REPO = <you>/gemma-cal-gguf
MODEL_FILE = gemma-cal-e4b-Q4_K_M.gguf # explicit file β€” repo may hold several quants/tiers
MMPROJ_REPO = unsloth/gemma-4-E4B-it-GGUF # projector repo, if different from the LLM's
MMPROJ_FILE = mmproj-F16.gguf # enables screenshot/vision input
```
The deploy workflow stays a plain git mirror β€” the model is pulled at runtime, never committed.
3. Push to `main` β†’ CI deploys β†’ the Space now serves your fine-tune (**Well-Tuned**).
## Share a trace (Sharing is Caring)
Want others to learn from a run? In the **Activity** tab, click **⬇ Download trace (JSON)** β€” the
trace stays on your device, and the hosted Space holds **no Hub token**. Personal data is redacted by
default (the activity log only carries counts + status; the one chat-name field is stripped). Then
publish it from your own machine, with your own login:
```bash
huggingface-cli login # or export HF_TOKEN=...
python training/share_trace.py trace.json --public # -> a HF dataset repo of traces
```
## Field notes
[**FIELD_NOTES.md**](./FIELD_NOTES.md) is the build retrospective — the iOS→`chat.db` pivot, the
`attributedBody` trap, why conflict math is deterministic, stub-first architecture, the
reframe-around-one-person lesson, and the Off-the-Grid trade-offs.
## Remote automation (runs without an interactive session)
| Workflow | Trigger | What it does | Needs |
|---|---|---|---|
| `.github/workflows/ci.yml` β†’ **test** | push / PR | compile + `pytest` (stub mode, no GPU) | nothing |
| `.github/workflows/ci.yml` β†’ **deploy** | push to `main`, after tests pass | `huggingface-cli upload` the repo to the HF Space (Gradio SDK; model excluded, pulled at runtime) | secret `HF_TOKEN`, var `SPACE_ID` |
| `.github/workflows/maintenance.yml` | daily + manual | ping the Space `/health`, audit outdated deps β†’ open/update a GitHub issue | var `SPACE_HEALTH_URL` |
One-time setup for deploy + monitoring:
```bash
gh secret set HF_TOKEN # HF write token
gh variable set SPACE_ID -b "<owner>/<space>"
gh variable set SPACE_HEALTH_URL -b "https://<owner>-<space>.hf.space/health"
```
CI installs `requirements-ci.txt` (excludes `llama-cpp-python` and the Google libs β€” both are
imported lazily and not needed for the stub-mode tests). A weekly Claude `/schedule` routine handles
the judgment work (grow `training/data/dataset.jsonl` β†’ PR, triage CI failures).