title: OffGridSchedula
emoji: ποΈ
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: Local-first chat-to-calendar agent (Gemma-4 E4B + MiniCPM)
tags:
- track:backyard
- sponsor:openbmb
- sponsor:modal
- achievement:offgrid
- achievement:welltuned
- achievement:offbrand
- achievement:llama
- achievement:sharing
- achievement:fieldnotes
models:
- build-small-hackathon/gemma-4-cal-gguf
- openbmb/MiniCPM5-1B-GGUF
demo_video:
- https://youtu.be/m-o0u9X3tI4
social_posts:
- https://x.com/nate_mauer/status/2065973341651882386
- https://x.com/nate_mauer/status/2064920352845709419
- https://x.com/nate_mauer/status/2065661878441750916
- https://www.linkedin.com/feed/update/urn:li:ugcPost:7471440639969132545
blog_post:
- https://huggingface.co/blog/build-small-hackathon/offgridschedula
made_by:
- ParetoOptimal - a.k.a., Nate Mauer
ποΈ Message Scheduling Agent
OffGridSchedula turns a pasted chat (or a flyer screenshot) into calendar events, catches conflicts, and drafts the reply β right from your phone, no app, no account,
no setup. iOS allows neither background iMessage access nor a persistent on-device LLM server, so there's no autonomous on-device agent to install; instead,
a foreground Shortcut (docs/automations.md) hands a thread or screenshot to the agent in two taps (optionally using a remote model via INFERENCE_BASE_URL).
The model runs on your own server or even on the phone itself and not on a cloud AI service. Your chats aren't shipped off to a third-party AI to be read; agent reads your snippet in memory and discards it after replying. The run trace you can optionally share is a redacted, sent to the agent you control that turns it into ready-to-add calendar events.
Hardware-aware. With under-powered hardware, the app warns users with an upgrade banner rather than hanging, the real model needs a tiny GPU.
Build Small submission β the idea & the tech
The idea. A busy parent's calendar lives in other people's messages β picture day in the
class chat, the practice that moved, the party flyer. OffGridSchedula turns those into calendar
events: paste the chat (or snap the flyer) from a phone browser, review the extracted events, the
conflicts against your own .ics, and a drafted reply β then add to Apple/Google Calendar in a tap.
The tech. Two small local models do the work. Extraction is gemma-cal E4B
(~4B effective params), our QLoRA fine-tune of Gemma-4 E4B that emits a single validated
ActionPlan (events Β· conflicts Β· reply Β· clarifying question), served with vision through
the official llama.cpp server inside this Docker Gradio Space β no cloud AI APIs. The
fine-tune + its 60-example task eval ran entirely on Modal serverless GPUs, behind an
eval gate that rejected eight regressed models before this one shipped. Conflict math is
deterministic Python, the UI is fully custom, the agent doubles as an MCP tool server, and
redacted run traces are public on the Hub.
Click Run the agents and a local OpenBMB MiniCPM planner (a second local llama-server)
drives this same Space's MCP tools as a multi-step agent β extract β check conflicts β render
.ics β with every step visible. Still zero cloud AI; every model under 32B.
What's new. Extraction now reads the logistics, not just the date (see below): arrival-aware
start times, durationβend conversion, type-based reminders, and calendar-ready titles β each
guaranteed by deterministic post-processing even when the model wobbles, and each shipped through
a measured A/B eval (full result tables: regex vs text-LLM vs
vision-LLM reading rendered screenshots only). Calendar out got one-click too: a unified
Connect your calendar block (Google OAuth β the token lives in your browser, never on the
server; Outlook/Apple need no sign-in) and per-event Google Β· Outlook Β· iCal links, with the
Google push verified end-to-end (push β readback β delete, 11/11).
The UX. One decision β Offline or Online β re-themes the whole workflow card and sets the
path: off-grid .ics only, or a one-click "Connect your calendar" whose Google OAuth token
lives only in the browser (server-verified each visit; the client secret never leaves the
server). Results land in a single card: events, conflicts, the drafted reply, and per-event
Google Β· Outlook Β· iCal Β· .ics quick-add links. Activity β This week tallies events
captured, conflicts caught, and time saved; a per-device Memory (localStorage, one-click
samples) feeds names and preferences back into extraction.
Submission links: requirement-by-requirement mapping Β· demo video Β· social posts 1 Β· 2
Who this is for
A busy parent whose kid's school and activity events are buried in a noisy class group chat β
picture day Thursday, the practice that moved to Tuesday, the birthday-party RSVP. They read it once,
mean to add it later, and miss it. With this, they paste the chat (or a screenshot of a flyer
or invite) from their phone's browser and get back: the events, a conflict check against their
calendar, and a ready-to-send reply β all surfaced for review before anything is saved. Output is
a local .ics they can add to any calendar, with optional Google Calendar push.
No app to install and no account. It reads nothing automatically β the parent pastes only what they
choose. Inference runs in the Space via llama.cpp (no cloud AI APIs), and works out of the box
with no GPU (see Accuracy upgrade below).
The model: gemma-cal E4B β one calendar-native LLM, built for exactly this
What makes this platform different isn't a prompt wrapped around a generic chatbot β it's
gemma-cal E4B, our own fine-tune of
Gemma-4 E4B purpose-built for one job: turning messy human conversation into calendar-ready
structure. The model doesn't chat. It reads a thread (or a flyer photo) and emits a single
validated ActionPlan β events with exact ISO datetimes, conflicts, proposed alternatives, a
drafted reply, and a clarifying question when the plan is too vague to schedule. It is the one
and only model the platform runs, everywhere from the production Space to a laptop.
- Edge-sized by design.
5 GB at Q4 β serves on a **$0.40/hr 16 GB T4** (vs $4+/hr A100-class for big models), a gaming GPU, or an Apple-silicon laptop, with full vision (screenshots/flyers) via its mmproj. Local-first isn't a tagline; it's the parameter count. - Schema-bulletproof. The fine-tune holds 100% schema validity even with no system prompt, with stronger no-event discipline (doesn't invent events from "thanks!") and a higher rate of asking when a date is TBD β the failure modes that actually burn users of generic models.
- Convention-trained. It learns this product's date semantics ("next Tuesday" means next week's Tuesday; weekday-anchored relative dates) instead of whatever a base model absorbed from the internet.
- Eval-gated, never vibes-shipped. Every retrain runs a 60-example task eval (start-exact
datetime matching, F1, validity, clarification) and cannot reach production unless it clears
the gate β the pipeline has rejected eight regressed models to date. The full, honest scorecard
lives in
docs/eval-roadmap.mdand the post-mortem write-up.
Hackathon size constraint (β€ 32B): easily β E4B is ~4B effective parameters. See the in-app π Submission tab for the full compliance scorecard.
Reads the logistics, not just the date
A confirmation like "Time: 10:30 AM Β· Duration: approx. 30β45 min Β· (Please arrive 15 minutes early to complete intake forms) Β· π 112A West 72nd Streetβ¦" becomes one correct event:
- Arrival-aware start β the event starts at 10:15 (when you must show up), the official 10:30 is preserved in the notes, and the end is anchored to the stated time + duration (11:00), so the calendar block covers the forms and the visit.
- Type-based notifications β an explicitly stated lead time always wins ("remind me 2 hours before" β 120); otherwise doctor/medical visits get 60 minutes, parties 30, carpools and school events 45.
- Real-world addresses β multi-line and π-emoji locations join into one string; "(Upper West Side β 72nd & Columbus)" glosses and SMS footers ("Reply C to confirmβ¦ call us at 212-223-0349") don't confuse it.
- Calendar-ready titles β an action+subject summary ("Pick up Priya β Terminal 4"), not a quote of the message.
The model is taught these conventions (prompt + fine-tune data), but the load-bearing ones are
also guaranteed by deterministic post-processing (apply_text_rules in
server/agent.py) β same philosophy as the conflict engine: must-hold
logistics are never left to model temperament. Every behavior above shipped through a measured
A/B eval β regex baseline vs text-LLM vs vision-LLM reading rendered chat screenshots only β
with the full tables in training/data/ab_results.md
(headline: text-LLM event F1 0.96 structured / 0.89 unstructured vs regex 0.60/0.67; the
screenshot-only vision arm lands within a point of text).
Try it in 30 seconds
Open the Space in your phone's browser β Schedule tab β tap Try a sample (or paste your own
group chat, and optionally a screenshot or your .ics) β review the detected events β Download
.ics. The Activity β This week panel then shows what you've captured and the time it saved.
How it works
Paste a thread / screenshot βββΆ HF Space βββΆ llama.cpp βββΆ events + conflicts + reply
(phone browser) β β
custom Gradio UI βββ review βββ ββββββ
βΌ βΌ
.ics download / optional Google Calendar
The primary path needs nothing but a browser: paste text and/or attach a screenshot in the Schedule tab. (Power users can also auto-feed messages from a Mac β see Optional: Mac collector.)
For the full solution-architecture view β every workflow and which LLM (if any) it calls, plus the eval-gated fine-tuning loop β see docs/architecture.md.
Can it process multiple invites at once?
Yes β multiple invites in one paste is the designed path (on the live Space, where the real
model runs). ActionPlan.events is a list, and the extraction prompt explicitly tells the model
that one thread often holds several events β a drop-off AND a pickup, or two appointments, are
separate events (server/agent.py). Everything downstream is built for N events: the results card
shows "N events found" with one card per invite, the editable table gets one row each, the .ics
contains one VEVENT per event, each event carries its own Google/Outlook/Apple quick-add links,
and the conflict check runs across all of them. Screenshot input is multi-file too β attach several
flyers and they're all read in one run.
Two caveats:
- Stub mode extracts only the first invite. The local-dev heuristic (
_stub_planinserver/agent.py, enabled byUSE_STUB_EXTRACTOR=1) works with no model and no GPU β and it's now a decent parser in its own right (labeled times, explicit dates, multi-line/π locations, durations, arrival-early shifts, type-based reminders) β but it still returns at most one event. If you paste a multi-invite thread locally and get one event back, that's the stub, not the product; the deployed Space uses the multi-event model path. - Simultaneous runs are serialized, not parallel. If two users (or two tabs) hit Run the
agents at once, both complete, but inference executes one request at a time β
server/model.pyholds the llama.cpp instance behind athreading.Lock, and Gradio queues the events. On a single-GPU Space that's intentional (one model copy in memory); the second run simply waits its turn, then streams its own pipeline progress.
Repo layout
app.py # Gradio + FastAPI entrypoint (the Space)
server/
agent.py # thread (+images) -> validated ActionPlan
orchestrator.py # Run the agents: MiniCPM planner driving our own MCP tools
schema.py # Event / Conflict / ActionPlan pydantic models
model.py # llama.cpp load: GGUF + vision mmproj, constrained JSON
imageutil.py # image -> base64 data URI
ui/blocks.py # custom Gradio Blocks (reasoning, events, conflicts, reply)
static/app.css # custom CSS (Off-Brand)
calendar_out/
ics.py # .ics generation (off-grid default)
freebusy.py # parse existing .ics + deterministic conflict detection
gcal.py # optional Google Calendar push
collector/collector.py # Mac-side iMessage collector (text + image attachments)
training/ # dataset build + QLoRA fine-tune + GGUF/mmproj export
Dockerfile # dedicated-GPU Space: builds llama.cpp (0.3.28) WITH CUDA
requirements-docker.txt # runtime deps for the Docker image (llama.cpp built separately)
PLAN.md # full design + build plan
Quick start (local dev) β no GPU needed
pip install -r requirements.txt
# Runs the whole app with the built-in heuristic agent β no model, no GPU:
export USE_STUB_EXTRACTOR=1 INGEST_TOKEN="dev-secret"
python app.py # http://localhost:7860
Open it, go to the Schedule tab, and tap Try a sample β or paste a thread, attach chat
screenshots, and optionally upload your current calendar .ics for conflict checks.
(Heads-up: the stub agent extracts only the first invite in a thread β multi-invite extraction
needs the real model; see Can it process multiple invites at once? above.) Tip for
self-hosted installs: set CAL_ICS_PATH=/path/to/calendar.ics and conflict checks use that file
automatically whenever no .ics is uploaded β step 4 completes itself, fully offline. Review
the detected events, conflicts, proposed times, and the suggested reply, then add any event with
its Add to: Google Β· Outlook Β· iCal Β· .ics links (iCal and .ics both download the event's
.ics file; with 2+ events an iCal β all N events link grabs everything at once).
The Activity β This week panel shows what you've captured.
This week (impact)
The Activity tab has a This week panel that persists across restarts: events captured, conflicts caught, and estimated time saved. A "capture" is counted when a run surfaces events for review (adding to a calendar happens through the per-event links, which the server can't observe).
minutes_saved is a deliberately conservative, configurable estimate β not a measurement:
IMPACT_MIN_PER_EVENT (default 8 min per captured event) + IMPACT_MIN_PER_CONFLICT (default
15 min per conflict caught). Override either via env. State persists to IMPACT_PATH
(default /tmp/impact_weeks.json; point it at a persistent disk on a Space to survive rebuilds).
Accuracy upgrade (optional) β serve the real gemma-cal LLM
The stub agent above makes the demo work with no GPU. The production Space serves our
fine-tuned gemma-cal E4B through llama-server β no cloud AI APIs either way. The same
config works anywhere llama.cpp runs:
export USE_STUB_EXTRACTOR=0
export MODEL_HF_REPO="build-small-hackathon/gemma-4-cal-gguf"
export MODEL_FILE="gemma-cal-e4b-Q4_K_M.gguf" # ~5 GB edge fine-tune (what the Space serves)
export MMPROJ_REPO="unsloth/gemma-4-E4B-it-GGUF" # the E4B's own vision projector
export MMPROJ_FILE="mmproj-F16.gguf" # enables screenshot/vision input
bash scripts/start_space.sh
This is the platform's only model β the same ~5 GB GGUF serves the production Space (16 GB
T4), a gaming GPU, or a laptop. (MODEL_FILE is explicit on purpose: the model repo also stores
legacy training artifacts, so the -hf repo:Q4_K_M shorthand is ambiguous.)
Optional: Mac collector (power users)
The phone-paste path above needs nothing installed. If you'd rather have new iMessages fed in automatically, run the collector on a Mac where iMessages sync (iOS exposes no API for message content, so a Mac is the only auto-feed source):
cd collector && cp .env.example .env # edit SPACE_URL + INGEST_TOKEN
python collector.py
β οΈ The collector needs Full Disk Access (System Settings β Privacy & Security) to read
chat.db.
Autonomous & on a phone
There's a single backend endpoint β POST /agent (bearer INGEST_TOKEN) β that takes a thread
(or messages, + optional screenshot/.ics) and returns the extracted events, conflicts, and reply as
JSON (optionally an .ics or a Google Calendar push). Every front-end calls it:
- Fully autonomous (Mac) β set-and-forget:
INGEST_TOKEN=β¦ MODEL_GGUF=~/models/hermes.gguf scripts/setup_mac.shinstalls three launchd jobs (Hermesllama-server+ autonomous backend + collector). New iMessages you send or accept become calendar events automatically, deduped per chat. Triggers on outgoing messages by default (TRIGGER_ON=outgoing;anyto widen). - Hermes "grows-with-you" brain: point
INFERENCE_BASE_URLat a Hermesllama-server; its personal memory (peopleβroles, "you decline Mondays") improves extraction over time and is shown in the dashboard Memory tab. See docs/hermes.md. - iPhone, one tap: an iOS Shortcut shares a thread/screenshot to
/agentand adds the events to Apple Calendar natively β no.icsimport. - Android, hands-off: a Tasker/MacroDroid rule on a notification/SMS calls
/agentand inserts events. See docs/android-tasker.md. - On-device model: set
INFERENCE_BASE_URLto a localllama-server(e.g. Gemma E4B or a small Hermes in Termux) so inference runs on the phone β same agent, env-selected.
iOS can't read iMessage in the background (no message API), so fully-autonomous iMessage needs the Mac collector; the iPhone path is one-gesture. See docs/automations.md and docs/on-device.md.
Build Small β prizes & quests
Track: π‘ Backyard AI (track:backyard) β a practical app for a specific real person: a busy
parent whose family calendar is buried in a noisy class group chat.
Sponsor awards we compete for
| Award | Why this submission qualifies |
|---|---|
| π’ Modal Awards (best Modal-powered apps) | Modal powered the development of the platform's model end-to-end β required note, gladly given: training/modal_train.py (QLoRA fine-tune on serverless A100/H100s, Volumes caching weights), training/modal_eval.py + modal_quant_eval.py (the task eval served on llama.cpp inside Modal, incl. an f16/Q8_0/Q4_K_M quantization study and the regex/text/vision A/B harness), and training/gated_retrain.py (train β staging β eval β promote only past the gate β eight regressed models rejected, every run a Modal job). |
| π± OpenBMB Awards (standout MiniCPM builds, per track) | The agent is planned by OpenBMB MiniCPM (openbmb/MiniCPM4.1-8B-GGUF, Q4; the 1B variant is a config switch) on a second local llama-server, driving this Space's own MCP tools (extract_events β check_conflicts β make_ics) as a visible multi-step agent (server/orchestrator.py). MiniCPM is the agent's brain, not a garnish. |
(Not claimed: the OpenAI Track β no Codex-attributed commits β and the NVIDIA Nemotron Quest β different model family. We'd rather be honest than eligible.)
Special awards β our case
| Award | Our case |
|---|---|
| ποΈ Bonus Quest Champion | All six collectable quests claimed with evidence β the full sash (table below). |
| π¨ Off-Brand Award | Custom landing page, hero + carousel, grouped nav, bespoke results cards and Activity dashboard β ui/blocks.py + static/app.css, far past the stock Gradio look. |
| π Tiny Titan | The platform's one and only model is Gemma E4B β ~4B effective parameters (~5 GB at Q4, serves on a 16 GB T4 or a laptop), and a 1B MiniCPM planner variant is a config switch. Honest framing: E4B is a MatFormer "effective-4B" β judges' call whether that's tiny enough. |
| π¬ Best Demo | App + demo video + social post as one package β storyboard with every quest named on-camera in docs/demo-script.md. |
| π€ Best Agent | The MiniCPM-planned, MCP-tool-driven agent above β real multi-step tool use, every model under the 32B cap. |
| π Judges' Wildcard | No entry needed β but if "eval-gated fine-tuning with a public failure post-mortem" fits no category, we know where to find you. |
Collectable quests β all six claimed
| Quest | Evidence |
|---|---|
| π Off the Grid (local-first, no cloud APIs) | All inference is llama.cpp inside the Space; the only optional outbound call is the user's own Google Calendar push. |
| π― Well-Tuned (published fine-tune) | gemma-cal E4B β our QLoRA fine-tune is the model production serves, shipped through the eval gate with the honest scorecard public. |
| π¨ Off-Brand (custom UI) | See the Off-Brand Award case above. |
| π¦ Llama Champion (llama.cpp runtime) | The official ghcr.io/ggml-org/llama.cpp server image runs the GGUF + vision mmproj (Dockerfile, scripts/start_space.sh). |
| π‘ Sharing is Caring (open trace on the Hub) | Redacted agent traces published to ParetoOptimal/offgridschedula-traces β one click from the Activity tab. |
| π Field Notes (write-up) | FIELD_NOTES.md + the eval-gated fine-tuning post-mortem + project blog. |
Fine-tune on Modal (GPU)
training/modal_train.py runs the whole fine-tune on a serverless GPU and publishes the GGUF to
HF β no local GPU needed. It's a thin wrapper that ships this repo to Modal and runs the existing
pipeline (make_dataset.py β train_qlora.py β export_gguf.sh) on an A100/H100, then uploads the
quantized GGUF + mmproj to your HF repo. This is all offline prep, so Off the Grid is
untouched (the rule applies to the running app's inference, not dataset/training prep).
pip install modal
modal token new
modal secret create huggingface HF_TOKEN=hf_xxxxxxxx # your HF *write* token
# Validate the full pipeline cheaply first (cheap edge model, ~a couple $):
modal run training/modal_train.py --base-model google/gemma-4-E4B-it
# Then the real run (default A100-80GB; --gpu H100 for speed):
modal run training/modal_train.py
modal run training/modal_train.py --gpu H100 --num-epochs 3
On finish it prints the MODEL_REPO / MODEL_FILE / MMPROJ_FILE to set on the Space. Two
persistent Modal Volumes cache the base-model download and the outputs across runs, so iterating on
training/data/dataset.jsonl only re-pays for the training itself.
Cost (A100-80GB β $2.5/hr, per-second billing): a few-hundred-to-2000-example QLoRA run is ~1β3 hr β $5β15, so ~$250 of credit β 15β40 full iterations. Expand the dataset before the first real 31B run β the seeds in
make_dataset.pyare a smoke test, not a training set.
Publish your fine-tune & point the Space at it
The training run is the one step that spends your GPU/Modal credits β it's not done for you. Once you've run it, the path is turnkey:
- Recommended:
python training/gated_retrain.pyβ train β staging upload β 60-example eval β promote only if it beats the gate. A regressed model cannot reach production. (Rawmodal run training/modal_train.pyis the ungated equivalent for experiments.) - Point the Space at your model via Space variables (
scripts/start_space.shreads them at launch; set in Settings β Variables or withHfApi().add_space_variable):
The deploy workflow stays a plain git mirror β the model is pulled at runtime, never committed.MODEL_HF_REPO = <you>/gemma-cal-gguf MODEL_FILE = gemma-cal-e4b-Q4_K_M.gguf # explicit file β repo may hold several quants/tiers MMPROJ_REPO = unsloth/gemma-4-E4B-it-GGUF # projector repo, if different from the LLM's MMPROJ_FILE = mmproj-F16.gguf # enables screenshot/vision input - Push to
mainβ CI deploys β the Space now serves your fine-tune (Well-Tuned).
Share a trace (Sharing is Caring)
Want others to learn from a run? In the Activity tab, click β¬ Download trace (JSON) β the trace stays on your device, and the hosted Space holds no Hub token. Personal data is redacted by default (the activity log only carries counts + status; the one chat-name field is stripped). Then publish it from your own machine, with your own login:
huggingface-cli login # or export HF_TOKEN=...
python training/share_trace.py trace.json --public # -> a HF dataset repo of traces
Field notes
FIELD_NOTES.md is the build retrospective β the iOSβchat.db pivot, the
attributedBody trap, why conflict math is deterministic, stub-first architecture, the
reframe-around-one-person lesson, and the Off-the-Grid trade-offs.
Remote automation (runs without an interactive session)
| Workflow | Trigger | What it does | Needs |
|---|---|---|---|
.github/workflows/ci.yml β test |
push / PR | compile + pytest (stub mode, no GPU) |
nothing |
.github/workflows/ci.yml β deploy |
push to main, after tests pass |
huggingface-cli upload the repo to the HF Space (Gradio SDK; model excluded, pulled at runtime) |
secret HF_TOKEN, var SPACE_ID |
.github/workflows/maintenance.yml |
daily + manual | ping the Space /health, audit outdated deps β open/update a GitHub issue |
var SPACE_HEALTH_URL |
One-time setup for deploy + monitoring:
gh secret set HF_TOKEN # HF write token
gh variable set SPACE_ID -b "<owner>/<space>"
gh variable set SPACE_HEALTH_URL -b "https://<owner>-<space>.hf.space/health"
CI installs requirements-ci.txt (excludes llama-cpp-python and the Google libs β both are
imported lazily and not needed for the stub-mode tests). A weekly Claude /schedule routine handles
the judgment work (grow training/data/dataset.jsonl β PR, triage CI failures).