OffGridSchedula

Sleeping

App Files Files Community

OffGridSchedula / README.md

ParetoOptimal

Initial Commit

0366d65 18 days ago

preview code

Raw

History Blame Contribute Delete

27.9 kB

metadata

title: OffGridSchedula
emoji: 🗓️
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: Local-first chat-to-calendar agent (Gemma-4 E4B + MiniCPM)
tags:
  - track:backyard
  - sponsor:openbmb
  - sponsor:modal
  - achievement:offgrid
  - achievement:welltuned
  - achievement:offbrand
  - achievement:llama
  - achievement:sharing
  - achievement:fieldnotes
models:
  - build-small-hackathon/gemma-4-cal-gguf
  - openbmb/MiniCPM5-1B-GGUF
demo_video:
  - https://youtu.be/m-o0u9X3tI4
social_posts:
  - https://x.com/nate_mauer/status/2065973341651882386
  - https://x.com/nate_mauer/status/2064920352845709419
  - https://x.com/nate_mauer/status/2065661878441750916
  - https://www.linkedin.com/feed/update/urn:li:ugcPost:7471440639969132545
blog_post:
  - https://huggingface.co/blog/build-small-hackathon/offgridschedula
made_by:
  - ParetoOptimal - a.k.a., Nate Mauer

🗓️ Message Scheduling Agent

OffGridSchedula turns a pasted chat (or a flyer screenshot) into calendar events, catches conflicts, and drafts the reply — right from your phone, no app, no account, no setup. iOS allows neither background iMessage access nor a persistent on-device LLM server, so there's no autonomous on-device agent to install; instead, a foreground Shortcut (docs/automations.md) hands a thread or screenshot to the agent in two taps (optionally using a remote model via INFERENCE_BASE_URL).

The model runs on your own server or even on the phone itself and not on a cloud AI service. Your chats aren't shipped off to a third-party AI to be read; agent reads your snippet in memory and discards it after replying. The run trace you can optionally share is a redacted, sent to the agent you control that turns it into ready-to-add calendar events.

Hardware-aware. With under-powered hardware, the app warns users with an upgrade banner rather than hanging, the real model needs a tiny GPU.

Build Small submission — the idea & the tech

The idea. A busy parent's calendar lives in other people's messages — picture day in the class chat, the practice that moved, the party flyer. OffGridSchedula turns those into calendar events: paste the chat (or snap the flyer) from a phone browser, review the extracted events, the conflicts against your own .ics, and a drafted reply — then add to Apple/Google Calendar in a tap.

The tech. Two small local models do the work. Extraction is gemma-cal E4B (~4B effective params), our QLoRA fine-tune of Gemma-4 E4B that emits a single validated ActionPlan (events · conflicts · reply · clarifying question), served with vision through the official llama.cpp server inside this Docker Gradio Space — no cloud AI APIs. The fine-tune + its 60-example task eval ran entirely on Modal serverless GPUs, behind an eval gate that rejected eight regressed models before this one shipped. Conflict math is deterministic Python, the UI is fully custom, the agent doubles as an MCP tool server, and redacted run traces are public on the Hub. Click Run the agents and a local OpenBMB MiniCPM planner (a second local llama-server) drives this same Space's MCP tools as a multi-step agent — extract → check conflicts → render .ics — with every step visible. Still zero cloud AI; every model under 32B.

What's new. Extraction now reads the logistics, not just the date (see below): arrival-aware start times, duration→end conversion, type-based reminders, and calendar-ready titles — each guaranteed by deterministic post-processing even when the model wobbles, and each shipped through a measured A/B eval (full result tables: regex vs text-LLM vs vision-LLM reading rendered screenshots only). Calendar out got one-click too: a unified Connect your calendar block (Google OAuth — the token lives in your browser, never on the server; Outlook/Apple need no sign-in) and per-event Google · Outlook · iCal links, with the Google push verified end-to-end (push → readback → delete, 11/11). The UX. One decision — Offline or Online — re-themes the whole workflow card and sets the path: off-grid .ics only, or a one-click "Connect your calendar" whose Google OAuth token lives only in the browser (server-verified each visit; the client secret never leaves the server). Results land in a single card: events, conflicts, the drafted reply, and per-event Google · Outlook · iCal · .ics quick-add links. Activity → This week tallies events captured, conflicts caught, and time saved; a per-device Memory (localStorage, one-click samples) feeds names and preferences back into extraction.

Submission links: requirement-by-requirement mapping · demo video · social posts 1 · 2

Who this is for

A busy parent whose kid's school and activity events are buried in a noisy class group chat — picture day Thursday, the practice that moved to Tuesday, the birthday-party RSVP. They read it once, mean to add it later, and miss it. With this, they paste the chat (or a screenshot of a flyer or invite) from their phone's browser and get back: the events, a conflict check against their calendar, and a ready-to-send reply — all surfaced for review before anything is saved. Output is a local .ics they can add to any calendar, with optional Google Calendar push.

No app to install and no account. It reads nothing automatically — the parent pastes only what they choose. Inference runs in the Space via llama.cpp (no cloud AI APIs), and works out of the box with no GPU (see Accuracy upgrade below).

The model: `gemma-cal` E4B — one calendar-native LLM, built for exactly this

What makes this platform different isn't a prompt wrapped around a generic chatbot — it's gemma-cal E4B, our own fine-tune of Gemma-4 E4B purpose-built for one job: turning messy human conversation into calendar-ready structure. The model doesn't chat. It reads a thread (or a flyer photo) and emits a single validated ActionPlan — events with exact ISO datetimes, conflicts, proposed alternatives, a drafted reply, and a clarifying question when the plan is too vague to schedule. It is the one and only model the platform runs, everywhere from the production Space to a laptop.

Edge-sized by design. 5 GB at Q4 — serves on a **$0.40/hr 16 GB T4** (vs $4+/hr A100-class for big models), a gaming GPU, or an Apple-silicon laptop, with full vision (screenshots/flyers) via its mmproj. Local-first isn't a tagline; it's the parameter count.
Schema-bulletproof. The fine-tune holds 100% schema validity even with no system prompt, with stronger no-event discipline (doesn't invent events from "thanks!") and a higher rate of asking when a date is TBD — the failure modes that actually burn users of generic models.
Convention-trained. It learns this product's date semantics ("next Tuesday" means next week's Tuesday; weekday-anchored relative dates) instead of whatever a base model absorbed from the internet.
Eval-gated, never vibes-shipped. Every retrain runs a 60-example task eval (start-exact datetime matching, F1, validity, clarification) and cannot reach production unless it clears the gate — the pipeline has rejected eight regressed models to date. The full, honest scorecard lives in docs/eval-roadmap.md and the post-mortem write-up.

Hackathon size constraint (≤ 32B): easily — E4B is ~4B effective parameters. See the in-app 🏆 Submission tab for the full compliance scorecard.

Reads the logistics, not just the date

A confirmation like "Time: 10:30 AM · Duration: approx. 30–45 min · (Please arrive 15 minutes early to complete intake forms) · 📍 112A West 72nd Street…" becomes one correct event:

Arrival-aware start — the event starts at 10:15 (when you must show up), the official 10:30 is preserved in the notes, and the end is anchored to the stated time + duration (11:00), so the calendar block covers the forms and the visit.
Type-based notifications — an explicitly stated lead time always wins ("remind me 2 hours before" → 120); otherwise doctor/medical visits get 60 minutes, parties 30, carpools and school events 45.
Real-world addresses — multi-line and 📍-emoji locations join into one string; "(Upper West Side — 72nd & Columbus)" glosses and SMS footers ("Reply C to confirm… call us at 212-223-0349") don't confuse it.
Calendar-ready titles — an action+subject summary ("Pick up Priya — Terminal 4"), not a quote of the message.

The model is taught these conventions (prompt + fine-tune data), but the load-bearing ones are also guaranteed by deterministic post-processing (apply_text_rules in server/agent.py) — same philosophy as the conflict engine: must-hold logistics are never left to model temperament. Every behavior above shipped through a measured A/B eval — regex baseline vs text-LLM vs vision-LLM reading rendered chat screenshots only — with the full tables in training/data/ab_results.md (headline: text-LLM event F1 0.96 structured / 0.89 unstructured vs regex 0.60/0.67; the screenshot-only vision arm lands within a point of text).

Try it in 30 seconds

Open the Space in your phone's browser → Schedule tab → tap Try a sample (or paste your own group chat, and optionally a screenshot or your .ics) → review the detected events → Download .ics. The Activity → This week panel then shows what you've captured and the time it saved.

How it works

Paste a thread / screenshot ──▶ HF Space ──▶ llama.cpp ──▶ events + conflicts + reply
   (phone browser)                  │                              │
                              custom Gradio UI ◀── review ──┐  ┌────┘
                                                            ▼  ▼
                                          .ics download / optional Google Calendar

The primary path needs nothing but a browser: paste text and/or attach a screenshot in the Schedule tab. (Power users can also auto-feed messages from a Mac — see Optional: Mac collector.)

For the full solution-architecture view — every workflow and which LLM (if any) it calls, plus the eval-gated fine-tuning loop — see docs/architecture.md.

Can it process multiple invites at once?

Yes — multiple invites in one paste is the designed path (on the live Space, where the real model runs). ActionPlan.events is a list, and the extraction prompt explicitly tells the model that one thread often holds several events — a drop-off AND a pickup, or two appointments, are separate events (server/agent.py). Everything downstream is built for N events: the results card shows "N events found" with one card per invite, the editable table gets one row each, the .ics contains one VEVENT per event, each event carries its own Google/Outlook/Apple quick-add links, and the conflict check runs across all of them. Screenshot input is multi-file too — attach several flyers and they're all read in one run.

Two caveats:

Stub mode extracts only the first invite. The local-dev heuristic (_stub_plan in server/agent.py, enabled by USE_STUB_EXTRACTOR=1) works with no model and no GPU — and it's now a decent parser in its own right (labeled times, explicit dates, multi-line/📍 locations, durations, arrival-early shifts, type-based reminders) — but it still returns at most one event. If you paste a multi-invite thread locally and get one event back, that's the stub, not the product; the deployed Space uses the multi-event model path.
Simultaneous runs are serialized, not parallel. If two users (or two tabs) hit Run the agents at once, both complete, but inference executes one request at a time — server/model.py holds the llama.cpp instance behind a threading.Lock, and Gradio queues the events. On a single-GPU Space that's intentional (one model copy in memory); the second run simply waits its turn, then streams its own pipeline progress.

Repo layout

app.py                 # Gradio + FastAPI entrypoint (the Space)
server/
  agent.py             # thread (+images) -> validated ActionPlan
  orchestrator.py      # Run the agents: MiniCPM planner driving our own MCP tools
  schema.py            # Event / Conflict / ActionPlan pydantic models
  model.py             # llama.cpp load: GGUF + vision mmproj, constrained JSON
  imageutil.py         # image -> base64 data URI
ui/blocks.py           # custom Gradio Blocks (reasoning, events, conflicts, reply)
static/app.css         # custom CSS (Off-Brand)
calendar_out/
  ics.py               # .ics generation (off-grid default)
  freebusy.py          # parse existing .ics + deterministic conflict detection
  gcal.py              # optional Google Calendar push
collector/collector.py # Mac-side iMessage collector (text + image attachments)
training/              # dataset build + QLoRA fine-tune + GGUF/mmproj export
Dockerfile             # dedicated-GPU Space: builds llama.cpp (0.3.28) WITH CUDA
requirements-docker.txt # runtime deps for the Docker image (llama.cpp built separately)
PLAN.md                # full design + build plan

Quick start (local dev) — no GPU needed

pip install -r requirements.txt

# Runs the whole app with the built-in heuristic agent — no model, no GPU:
export USE_STUB_EXTRACTOR=1 INGEST_TOKEN="dev-secret"
python app.py            # http://localhost:7860

Open it, go to the Schedule tab, and tap Try a sample — or paste a thread, attach chat screenshots, and optionally upload your current calendar .ics for conflict checks. (Heads-up: the stub agent extracts only the first invite in a thread — multi-invite extraction needs the real model; see Can it process multiple invites at once? above.) Tip for self-hosted installs: set CAL_ICS_PATH=/path/to/calendar.ics and conflict checks use that file automatically whenever no .ics is uploaded — step 4 completes itself, fully offline. Review the detected events, conflicts, proposed times, and the suggested reply, then add any event with its Add to: Google · Outlook · iCal · .ics links (iCal and .ics both download the event's .ics file; with 2+ events an iCal — all N events link grabs everything at once). The Activity → This week panel shows what you've captured.

This week (impact)

The Activity tab has a This week panel that persists across restarts: events captured, conflicts caught, and estimated time saved. A "capture" is counted when a run surfaces events for review (adding to a calendar happens through the per-event links, which the server can't observe).

minutes_saved is a deliberately conservative, configurable estimate — not a measurement: IMPACT_MIN_PER_EVENT (default 8 min per captured event) + IMPACT_MIN_PER_CONFLICT (default 15 min per conflict caught). Override either via env. State persists to IMPACT_PATH (default /tmp/impact_weeks.json; point it at a persistent disk on a Space to survive rebuilds).

Accuracy upgrade (optional) — serve the real `gemma-cal` LLM

The stub agent above makes the demo work with no GPU. The production Space serves our fine-tuned gemma-cal E4B through llama-server — no cloud AI APIs either way. The same config works anywhere llama.cpp runs:

export USE_STUB_EXTRACTOR=0
export MODEL_HF_REPO="build-small-hackathon/gemma-4-cal-gguf"
export MODEL_FILE="gemma-cal-e4b-Q4_K_M.gguf"     # ~5 GB edge fine-tune (what the Space serves)
export MMPROJ_REPO="unsloth/gemma-4-E4B-it-GGUF"  # the E4B's own vision projector
export MMPROJ_FILE="mmproj-F16.gguf"              # enables screenshot/vision input
bash scripts/start_space.sh

This is the platform's only model — the same ~5 GB GGUF serves the production Space (16 GB T4), a gaming GPU, or a laptop. (MODEL_FILE is explicit on purpose: the model repo also stores legacy training artifacts, so the -hf repo:Q4_K_M shorthand is ambiguous.)

Optional: Mac collector (power users)

The phone-paste path above needs nothing installed. If you'd rather have new iMessages fed in automatically, run the collector on a Mac where iMessages sync (iOS exposes no API for message content, so a Mac is the only auto-feed source):

cd collector && cp .env.example .env   # edit SPACE_URL + INGEST_TOKEN
python collector.py

⚠️ The collector needs Full Disk Access (System Settings → Privacy & Security) to read chat.db.

Autonomous & on a phone

There's a single backend endpoint — POST /agent (bearer INGEST_TOKEN) — that takes a thread (or messages, + optional screenshot/.ics) and returns the extracted events, conflicts, and reply as JSON (optionally an .ics or a Google Calendar push). Every front-end calls it:

Fully autonomous (Mac) — set-and-forget: INGEST_TOKEN=… MODEL_GGUF=~/models/hermes.gguf scripts/setup_mac.sh installs three launchd jobs (Hermes llama-server + autonomous backend + collector). New iMessages you send or accept become calendar events automatically, deduped per chat. Triggers on outgoing messages by default (TRIGGER_ON=outgoing; any to widen).
Hermes "grows-with-you" brain: point INFERENCE_BASE_URL at a Hermes llama-server; its personal memory (people→roles, "you decline Mondays") improves extraction over time and is shown in the dashboard Memory tab. See docs/hermes.md.
iPhone, one tap: an iOS Shortcut shares a thread/screenshot to /agent and adds the events to Apple Calendar natively — no .ics import.
Android, hands-off: a Tasker/MacroDroid rule on a notification/SMS calls /agent and inserts events. See docs/android-tasker.md.
On-device model: set INFERENCE_BASE_URL to a local llama-server (e.g. Gemma E4B or a small Hermes in Termux) so inference runs on the phone — same agent, env-selected.

iOS can't read iMessage in the background (no message API), so fully-autonomous iMessage needs the Mac collector; the iPhone path is one-gesture. See docs/automations.md and docs/on-device.md.

Build Small — prizes & quests

Track: 🏡 Backyard AI (track:backyard) — a practical app for a specific real person: a busy parent whose family calendar is buried in a noisy class group chat.

Sponsor awards we compete for

Award Why this submission qualifies

🟢 Modal Awards (best Modal-powered apps) Modal powered the development of the platform's model end-to-end — required note, gladly given: training/modal_train.py (QLoRA fine-tune on serverless A100/H100s, Volumes caching weights), training/modal_eval.py + modal_quant_eval.py (the task eval served on llama.cpp inside Modal, incl. an f16/Q8_0/Q4_K_M quantization study and the regex/text/vision A/B harness), and training/gated_retrain.py (train → staging → eval → promote only past the gate — eight regressed models rejected, every run a Modal job).

🌱 OpenBMB Awards (standout MiniCPM builds, per track) The agent is planned by OpenBMB MiniCPM (openbmb/MiniCPM4.1-8B-GGUF, Q4; the 1B variant is a config switch) on a second local llama-server, driving this Space's own MCP tools (extract_events → check_conflicts → make_ics) as a visible multi-step agent (server/orchestrator.py). MiniCPM is the agent's brain, not a garnish.

Award	Why this submission qualifies
🟢 Modal Awards (best Modal-powered apps)	Modal powered the development of the platform's model end-to-end — required note, gladly given: `training/modal_train.py` (QLoRA fine-tune on serverless A100/H100s, Volumes caching weights), `training/modal_eval.py` + `modal_quant_eval.py` (the task eval served on llama.cpp inside Modal, incl. an f16/Q8_0/Q4_K_M quantization study and the regex/text/vision A/B harness), and `training/gated_retrain.py` (train → staging → eval → promote only past the gate — eight regressed models rejected, every run a Modal job).
🌱 OpenBMB Awards (standout MiniCPM builds, per track)	The agent is planned by OpenBMB MiniCPM (`openbmb/MiniCPM4.1-8B-GGUF`, Q4; the 1B variant is a config switch) on a second local llama-server, driving this Space's own MCP tools (`extract_events → check_conflicts → make_ics`) as a visible multi-step agent (`server/orchestrator.py`). MiniCPM is the agent's brain, not a garnish.

(Not claimed: the OpenAI Track — no Codex-attributed commits — and the NVIDIA Nemotron Quest — different model family. We'd rather be honest than eligible.)

Special awards — our case

Award	Our case
🎖️ Bonus Quest Champion	All six collectable quests claimed with evidence — the full sash (table below).
🎨 Off-Brand Award	Custom landing page, hero + carousel, grouped nav, bespoke results cards and Activity dashboard — `ui/blocks.py` + `static/app.css`, far past the stock Gradio look.
🐜 Tiny Titan	The platform's one and only model is *Gemma E4B — ~4B effective* parameters** (~5 GB at Q4, serves on a 16 GB T4 or a laptop), and a 1B MiniCPM planner variant is a config switch. Honest framing: E4B is a MatFormer "effective-4B" — judges' call whether that's tiny enough.
🎬 Best Demo	App + demo video + social post as one package — storyboard with every quest named on-camera in `docs/demo-script.md`.
🤖 Best Agent	The MiniCPM-planned, MCP-tool-driven agent above — real multi-step tool use, every model under the 32B cap.
🃏 Judges' Wildcard	No entry needed — but if "eval-gated fine-tuning with a public failure post-mortem" fits no category, we know where to find you.

Collectable quests — all six claimed

Quest	Evidence
🔌 Off the Grid (local-first, no cloud APIs)	All inference is llama.cpp inside the Space; the only optional outbound call is the user's own Google Calendar push.
🎯 Well-Tuned (published fine-tune)	`gemma-cal` E4B — our QLoRA fine-tune is the model production serves, shipped through the eval gate with the honest scorecard public.
🎨 Off-Brand (custom UI)	See the Off-Brand Award case above.
🦙 Llama Champion (llama.cpp runtime)	The official `ghcr.io/ggml-org/llama.cpp` server image runs the GGUF + vision mmproj (`Dockerfile`, `scripts/start_space.sh`).
📡 Sharing is Caring (open trace on the Hub)	Redacted agent traces published to `ParetoOptimal/offgridschedula-traces` — one click from the Activity tab.
📓 Field Notes (write-up)	`FIELD_NOTES.md` + the eval-gated fine-tuning post-mortem + project blog.

Fine-tune on Modal (GPU)

training/modal_train.py runs the whole fine-tune on a serverless GPU and publishes the GGUF to HF — no local GPU needed. It's a thin wrapper that ships this repo to Modal and runs the existing pipeline (make_dataset.py → train_qlora.py → export_gguf.sh) on an A100/H100, then uploads the quantized GGUF + mmproj to your HF repo. This is all offline prep, so Off the Grid is untouched (the rule applies to the running app's inference, not dataset/training prep).

pip install modal
modal token new
modal secret create huggingface HF_TOKEN=hf_xxxxxxxx     # your HF *write* token

# Validate the full pipeline cheaply first (cheap edge model, ~a couple $):
modal run training/modal_train.py --base-model google/gemma-4-E4B-it

# Then the real run (default A100-80GB; --gpu H100 for speed):
modal run training/modal_train.py
modal run training/modal_train.py --gpu H100 --num-epochs 3

On finish it prints the MODEL_REPO / MODEL_FILE / MMPROJ_FILE to set on the Space. Two persistent Modal Volumes cache the base-model download and the outputs across runs, so iterating on training/data/dataset.jsonl only re-pays for the training itself.

Cost (A100-80GB ≈ $2.5/hr, per-second billing): a few-hundred-to-2000-example QLoRA run is ~1–3 hr ≈ $5–15, so ~$250 of credit ≈ 15–40 full iterations. Expand the dataset before the first real 31B run — the seeds in make_dataset.py are a smoke test, not a training set.

Publish your fine-tune & point the Space at it

The training run is the one step that spends your GPU/Modal credits — it's not done for you. Once you've run it, the path is turnkey:

Recommended: python training/gated_retrain.py — train → staging upload → 60-example eval → promote only if it beats the gate. A regressed model cannot reach production. (Raw modal run training/modal_train.py is the ungated equivalent for experiments.)

Point the Space at your model via Space variables (scripts/start_space.sh reads them at launch; set in Settings → Variables or with HfApi().add_space_variable):

MODEL_HF_REPO = <you>/gemma-cal-gguf
MODEL_FILE    = gemma-cal-e4b-Q4_K_M.gguf   # explicit file — repo may hold several quants/tiers
MMPROJ_REPO   = unsloth/gemma-4-E4B-it-GGUF # projector repo, if different from the LLM's
MMPROJ_FILE   = mmproj-F16.gguf             # enables screenshot/vision input

The deploy workflow stays a plain git mirror — the model is pulled at runtime, never committed.

Push to main → CI deploys → the Space now serves your fine-tune (Well-Tuned).

Share a trace (Sharing is Caring)

Want others to learn from a run? In the Activity tab, click ⬇ Download trace (JSON) — the trace stays on your device, and the hosted Space holds no Hub token. Personal data is redacted by default (the activity log only carries counts + status; the one chat-name field is stripped). Then publish it from your own machine, with your own login:

huggingface-cli login                                   # or export HF_TOKEN=...
python training/share_trace.py trace.json --public      # -> a HF dataset repo of traces

Field notes

FIELD_NOTES.md is the build retrospective — the iOS→chat.db pivot, the attributedBody trap, why conflict math is deterministic, stub-first architecture, the reframe-around-one-person lesson, and the Off-the-Grid trade-offs.

Remote automation (runs without an interactive session)

Workflow	Trigger	What it does	Needs
`.github/workflows/ci.yml` → test	push / PR	compile + `pytest` (stub mode, no GPU)	nothing
`.github/workflows/ci.yml` → deploy	push to `main`, after tests pass	`huggingface-cli upload` the repo to the HF Space (Gradio SDK; model excluded, pulled at runtime)	secret `HF_TOKEN`, var `SPACE_ID`
`.github/workflows/maintenance.yml`	daily + manual	ping the Space `/health`, audit outdated deps → open/update a GitHub issue	var `SPACE_HEALTH_URL`

One-time setup for deploy + monitoring:

gh secret set HF_TOKEN                       # HF write token
gh variable set SPACE_ID -b "<owner>/<space>"
gh variable set SPACE_HEALTH_URL -b "https://<owner>-<space>.hf.space/health"

CI installs requirements-ci.txt (excludes llama-cpp-python and the Google libs — both are imported lazily and not needed for the stub-mode tests). A weekly Claude /schedule routine handles the judgment work (grow training/data/dataset.jsonl → PR, triage CI failures).