OffGridSchedula

Sleeping

App Files Files Community

ParetoOptimal commited on 17 days ago

Commit

0366d65

0 Parent(s):

Initial Commit

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +4 -0
.gitignore +43 -0
Dockerfile +41 -0
FIELD_NOTES.md +145 -0
PLAN.md +182 -0
README.md +435 -0
app.py +298 -0
calendar_out/__init__.py +0 -0
calendar_out/freebusy.py +141 -0
calendar_out/gcal.py +313 -0
calendar_out/ics.py +60 -0
calendar_out/tzconfig.py +46 -0
collector/.env.example +13 -0
collector/collector.py +175 -0
deploy/launchd/com.offgrid.backend.plist +33 -0
deploy/launchd/com.offgrid.collector.plist +28 -0
deploy/launchd/com.offgrid.hermes.plist +23 -0
docs/android-tasker.md +37 -0
docs/architecture.md +121 -0
docs/automations.md +83 -0
docs/blog-eval-gated-finetuning.md +187 -0
docs/build-small-submission.md +68 -0
docs/eval-roadmap.md +337 -0
docs/gcal-verify.md +73 -0
docs/hermes.md +48 -0
docs/on-device.md +54 -0
requirements-ci.txt +16 -0
requirements-docker.txt +24 -0
requirements.txt +47 -0
scripts/setup_mac.sh +60 -0
scripts/start_space.sh +85 -0
scripts/verify_gcal_e2e.py +159 -0
server/__init__.py +0 -0
server/agent.py +475 -0
server/dedup.py +84 -0
server/events.py +116 -0
server/health.py +54 -0
server/imageutil.py +61 -0
server/impact.py +87 -0
server/mcp_tools.py +117 -0
server/memory.py +174 -0
server/model.py +317 -0
server/orchestrator.py +191 -0
server/pipeline.py +98 -0
server/schema.py +43 -0
server/threads.py +59 -0
server/tools.py +81 -0
server/trace.py +98 -0
static/app.css +961 -0
static/logo.png +0 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,4 @@

+# Shell scripts must use LF so bash on Linux (HF Space, Modal) doesn't choke on
+# carriage returns (e.g. "set: pipefail: invalid option name") when the repo is
+# committed/edited from Windows.
+*.sh text eol=lf

.gitignore ADDED Viewed

	@@ -0,0 +1,43 @@

+# Secrets / config
+.env
+*.env
+!*.env.example
+collector/.env
+# Models & data (large; live on HF, not git)
+*.gguf
+*.bin
+*.safetensors
+models/
+training/data/*
+!training/data/dataset.jsonl
+!training/data/eval.jsonl
+!training/data/eval_unstructured.jsonl
+!training/data/ab_results.md
+# screenshots/ stays ignored: regenerate with training/render_screenshots.py
+training/outputs/
+checkpoints/
+# Generated calendar files
+*.ics
+out/
+# Google OAuth
+token.json
+credentials.json
+client_secret*.json
+# Python
+__pycache__/
+*.py[cod]
+.venv/
+venv/
+.ipynb_checkpoints/
+*.egg-info/
+# OS / editor
+.DS_Store
+Thumbs.db
+.vscode/
+.idea/
+tok.json

Dockerfile ADDED Viewed

	@@ -0,0 +1,41 @@

+# Dedicated paid-GPU Space (Docker SDK) — real Gemma 4 on the OFFICIAL llama.cpp.
+# Compiling llama.cpp in the HF build exceeds the build time limit, so we base on the
+# llama.cpp project's own prebuilt CUDA image (trusted, current → supports Gemma 4).
+# It runs `llama-server`; our app (UI + /agent) calls it via INFERENCE_BASE_URL.
+# Pick a CUDA GPU in Space settings (e.g. 1x A100). Llama Champion = the llama.cpp server.
+FROM ghcr.io/ggml-org/llama.cpp:server-cuda
+ENV PYTHONUNBUFFERED=1 \
+    DEBIAN_FRONTEND=noninteractive \
+    PORT=7860 \
+    SERVE=uvicorn \
+    HF_HOME=/tmp/hf \
+    LLAMA_CACHE=/tmp/llama-cache \
+    INFERENCE_BASE_URL="http://127.0.0.1:8080/v1" \
+    INFERENCE_MODEL="gemma-4" \
+    MODEL_HF_REPO="ParetoOptimal/gemma-4-cal-gguf" \
+    MODEL_FILE="gemma-cal-e4b-Q4_K_M.gguf" \
+    MMPROJ_REPO="unsloth/gemma-4-E4B-it-GGUF" \
+    MMPROJ_FILE="mmproj-F16.gguf"
+# Agent-tab planner (OFF by default — set as Space variables to enable):
+#   PLANNER_HF_REPO="openbmb/MiniCPM4.1-8B-GGUF"  PLANNER_FILE="MiniCPM4.1-8B-Q4_K_M.gguf"
+#   (tiny <=4B variant: openbmb/MiniCPM5-1B-GGUF / MiniCPM5-1B-Q4_K_M.gguf)
+#   PLANNER_PORT=8081  PLANNER_NGL=999  PLANNER_BASE_URL=http://127.0.0.1:8081/v1
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3 python3-pip curl ca-certificates && \
+    rm -rf /var/lib/apt/lists/*
+# Keep our app out of the image's /app (where the llama-server binary lives).
+WORKDIR /srv
+COPY requirements-docker.txt .
+# --break-system-packages: the base image's Python is PEP 668 externally-managed.
+RUN pip3 install --no-cache-dir --break-system-packages -r requirements-docker.txt
+COPY . .
+# The base image's entrypoint is llama-server; we run our launcher instead.
+ENTRYPOINT []
+EXPOSE 7860
+CMD ["bash", "scripts/start_space.sh"]

FIELD_NOTES.md ADDED Viewed

	@@ -0,0 +1,145 @@

+# Field Notes — building the iMessage → Calendar agent
+What I set out to build, where reality bent the plan, and what I'd do next. This is
+the "what I learned" companion to the product docs ([README](./README.md)) and the
+design doc ([PLAN](./PLAN.md)).
+## The goal in one line
+Turn the calendar logistics buried in a chat thread — *"picture day moved to
+Thursday 9am", "soccer is Tuesday now"* — into reviewed calendar events, from a
+phone, with the data staying private.
+## 1. "Read my iMessages" is impossible as literally asked — and that shaped everything
+iOS exposes **no API** for iMessage/SMS *content*. There is no on-device path. The
+only place the messages exist in a queryable form is a Mac, where they sync to
+`~/Library/Messages/chat.db`. So the architecture forked early:
+- A Mac-side collector ([`collector/collector.py`](./collector/collector.py)) reads
+  `chat.db` (read-only, `mode=ro`) and POSTs new rows to the Space.
+- "On my phone" was reinterpreted honestly as **used from** a phone browser — the
+  Space is hosted, the UI is mobile-friendly, but the model runs in the Space.
+The biggest adoption lesson came later: requiring a Mac collector + Full Disk
+Access is a wall for a non-technical user. The fix was to make **paste-from-phone**
+the hero path (the collector is now strictly optional) — no install, no DB, no
+permissions. Most of that capability already existed in the Schedule tab; it was just
+framed as secondary.
+## 2. `attributedBody` is the iMessage parsing trap
+Modern Messages often stores the body in `attributedBody` (an `NSAttributedString`
+binary blob), **not** the `text` column. The collector reads `text` directly for
+simplicity and **skips messages that only have `attributedBody`**
+([`collector/collector.py:88-94`](./collector/collector.py)) — a deliberate, called-out
+gap. The right move for production is to not hand-roll this: use `imessage-exporter`
+(ReagentX) or `imessage_reader`. Noting the limitation in code beat pretending the
+naive SQL was complete.
+## 3. Relative dates are the real accuracy battleground
+The hard part isn't "is there an event" — it's *when*. "Next Thursday", "the 14th",
+"in two weeks" only resolve against a reference time. Two design responses:
+- The system prompt pins **"Current datetime"** into every request and instructs the
+  model to resolve relative dates from it ([`server/agent.py:21-34`](./server/agent.py)).
+- **Conflict math is deterministic, not model-driven.** Overlap/adjacent/tight
+  detection and alternative-time proposals live in
+  [`calendar_out/freebusy.py`](./calendar_out/freebusy.py), because once you have ISO
+  datetimes, interval math should never be left to an LLM. The model decides *what*;
+  code decides *when-it-clashes*.
+The stub extractor's naive "match a time → 1h event tomorrow"
+([`server/agent.py:152-175`](./server/agent.py)) is intentionally dumb — it exists to
+prove the pipeline, and its dumbness is a good reminder of exactly how much the
+fine-tune has to get right.
+## 4. Stub-first was the best architectural call
+`USE_STUB_EXTRACTOR=1` swaps the model for a regex heuristic
+([`server/agent.py:85,124`](./server/agent.py)), forced on in tests
+([`tests/conftest.py`](./tests/conftest.py)). Payoffs:
+- The whole app — paste → events → conflicts → `.ics` download → impact panel —
+  **works end-to-end with no GPU**, so a demo (and CI) never depends on a model load.
+- `llama_cpp` and the Google libs are **lazy-imported**, so `requirements-ci.txt` can
+  exclude them and the test suite runs in seconds, offline.
+Lesson: make the expensive dependency optional from day one and the cheap path
+becomes your test harness, your demo, and your free tier all at once.
+## 5. Reframing around one person changed the scope more than any feature
+The project started as a four-track hackathon checklist. Rewriting it around a single
+named person — a **busy parent** whose kid's events are buried in a class group chat —
+forced three concrete changes: phone-paste as the default, a one-tap **Try a sample**
+class-chat ([`ui/blocks.py`](./ui/blocks.py)), and a **"This week"** impact panel.
+On measurement: `minutes_saved` ([`server/impact.py`](./server/impact.py)) is a
+**configurable estimate, not a measurement** (default 8 min/event + 15 min/conflict).
+Saying that plainly — in the UI, the README, and here — matters more than a
+bigger-looking number. A capture is only counted when the parent *accepts* events by
+exporting them, so the metric tracks value taken, not previews shown.
+## 6. Fine-tuning economics: Modal credits + honest scope
+QLoRA on a 31B needs an 80GB GPU. [`training/modal_train.py`](./training/modal_train.py)
+wraps the existing `train_qlora.py` + `export_gguf.sh` to run on a serverless
+A100/H100 and publish the GGUF to HF — roughly **$5–15 per run**, so ~$250 of credit
+is 15–40 iterations. The "Well-Tuned" track went the distance: the eval-gated **E4B** fine-tune is
+published and is what production serves —
+[`build-small-hackathon/gemma-4-cal-gguf`](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf)
+— after clearing the gate over six runs at zero quality cost vs. stock E4B. (Re-running the pipeline
+still spends your own Modal credits; the turnkey path is there whenever you want to retrain.)
+A small rule that paid off: training-data generation can use *any* offline tooling —
+the "no cloud AI API" rule applies only to the **running app's inference**, not to
+dataset prep.
+## 7. Two models, not one — a 1B planner over the same tools
+What shipped is two small local models, not one. The fine-tuned **gemma-cal E4B** does the
+*reading* (thread → validated `ActionPlan`); a 1B **OpenBMB MiniCPM** does the *orchestrating*.
+Clicking **Run the agents** hands the job to MiniCPM, which drives the Space's own MCP tools —
+`extract_events → check_conflicts → make_ics` — as a visible multi-step agent
+([`server/orchestrator.py`](./server/orchestrator.py)), consuming the *public* tool contract
+instead of calling internals. Two things I'd underline: keep the planner **optional** (a
+deterministic scripted plan is the fallback, so the agentic path never hard-depends on a second
+model load), and don't let "agent" become a separate destination — the same **Run the agents**
+action drives both the home workflow and the orchestrated trace, so it stays one engine, not a
+second UI to keep in sync.
+## 8. The Off-the-Grid tension
+"No cloud AI APIs" and "serve a 31B" pull against each other: a Q4 31B GGUF is
+~18–20GB and needs a GPU. Keeping inference **in the Space via `llama.cpp`** preserves
+the privacy story but costs GPU. The honest compromise is the **E4B edge variant** for
+the free tier, with the 31B as the headline. I deliberately did **not** offload
+inference to a third-party endpoint, because "your own Modal GPU" and "a cloud AI API"
+are easy to conflate and a purist judge would be right to dock it.
+The same principle drove the trace-sharing design (below): the hosted Space holds **no
+HF token** — it only offers a **local download**, and a separate local CLI does the
+upload with your own auth.
+## 9. What I'd do next
+- **Durable trace/metrics store.** The activity bus is an 800-entry in-memory ring
+  buffer ([`server/events.py`](./server/events.py)) — runs are lost on restart, so only
+  recent runs are exportable. A small append-only store (the impact log already shows
+  the pattern) would fix it.
+- **Decode `attributedBody`** (or adopt `imessage-exporter`) so text-less messages stop
+  being dropped.
+- **A real eval set** from the expanded dataset — measure JSON validity + field
+  accuracy, especially relative-date resolution and empty-list-on-chitchat.
+- **Trace redaction as a tested invariant.** Today it's an allowlist over current emit
+  sites ([`server/trace.py`](./server/trace.py)); a lint/test that fails when a new
+  `emit(...)` puts free text on a non-`ingest` stage would keep it honest as the code
+  grows.
+## Publishing these notes
+This file is linked from the README. It can also be pasted into the Space's README
+(the Space card renders Markdown) or posted to the model/dataset repo's **Community**
+tab on the Hub so others can learn from the build.

PLAN.md ADDED Viewed

	@@ -0,0 +1,182 @@

+# Plan: Local-First iMessage → Calendar Agent (Gradio + llama.cpp + fine-tuned Gemma + an OpenBMB MiniCPM-planned agent)
+## Who this is for
+One named person: **a busy parent** whose kid's school/activity events are buried in a noisy class
+group chat (picture day, the practice that moved, the RSVP). They read it once, mean to add it later,
+and miss it. Success = *their* day measurably improves — events captured from the chat, conflicts
+caught against their calendar, minutes saved — with **zero setup**: paste the thread or a screenshot
+from a phone browser. The local-LLM / fine-tune work below is a **means** to better extraction, not
+the point; the app must deliver value with no GPU (stub agent) first.
+## Context
+You want an agent that reads iMessage-style threads, understands the conversation, and turns them
+into calendar events/reminders — exposed through a custom Gradio UI deployed as a Hugging Face Space.
+Two local models share the work: our fine-tuned Gemma does the *reading* (thread → validated
+ActionPlan), and an **OpenBMB MiniCPM** planner does the *orchestrating* — the brain behind **Run the agents**,
+driving the Space's own MCP tools (`extract_events → check_conflicts → make_ics`) as a visible
+multi-step agent. The build competes in the **Backyard AI** track (general and OpenBMB prizes are
+awarded per track) and satisfies the quests secondary to the user story above: **Off the Grid** (no
+cloud AI APIs, local-first), **Well-Tuned** (a fine-tuned model on HF), **Off-Brand** (custom UI),
+and **Llama Champion** (both Gemma and MiniCPM are served through llama.cpp).
+### Feasibility verdict: YES, with one re-architecture
+The request as *literally* worded has two impossibilities, both solvable:
+1. **No app or cloud can read iMessage on iOS.** Apple exposes no API for iMessage/SMS content.
+   → **Solved:** you have a Mac. iMessages sync to `~/Library/Messages/chat.db`; a small local
+   collector reads it. This is the *only* supported path and it keeps data local ("off the grid").
+2. **A model cannot "run on your phone," and a HF Space runs in the cloud, not on-device.**
+   → **Solved:** "on my phone" = *used from* your phone's browser. The Space does its own llama.cpp
+   inference and calls no external AI service, so "hosted Space" and "off the grid" reconcile.
+Confirmed decisions:
+- **Ingestion:** Mac collector reading `chat.db`.
+- **Calendar output:** local `.ics` files first (strictly off-grid), with an *optional* Google
+  Calendar push toggle as a bonus.
+- **Extraction model:** fine-tune Gemma, serve as GGUF via llama.cpp (production serves the
+  **E4B** edge fine-tune, `build-small-hackathon/gemma-4-cal-gguf`).
+- **Agent planner:** **OpenBMB MiniCPM** (`openbmb/MiniCPM4.1-8B-GGUF`, Q4; the 1B variant is a
+  config switch) on a second llama-server — it plans, the MCP tools execute, every step visible.
+---
+## Architecture
+```
+┌────────── Your Mac (local) ──────────┐         ┌──────── Hugging Face Space (Docker) ────────┐
+│  collector.py (Full Disk Access)      │  HTTPS  │  Gradio (custom theme/CSS)  ── Off-Brand     │
+│  • polls chat.db for new messages     │ +token  │        │                                     │
+│  • parses text / attributedBody       ├────────▶│  FastAPI /ingest  ──▶  extraction pipeline   │
+│  • POSTs new msgs to Space /ingest    │         │        │                                     │
+└───────────────────────────────────────┘         │  llama.cpp (llama-cpp-python) ── Llama Champ │
+                                                   │   running YOUR fine-tuned gemma-4-31B GGUF   │
+   View/approve from phone browser ───────────────▶│        │            ── Off the Grid (local) │
+                                                   │  JSON events → pydantic validate            │
+                                                   │        ├──▶ .ics file (download)            │
+                                                   │        └──▶ optional Google Calendar push   │
+                                                   └──────────────────────────────────────────────┘
+```
+Flow: messages → extraction prompt → model emits structured JSON of candidate events →
+validated → shown in UI for review → user approves → `.ics` generated (and/or pushed to GCal).
+**Run the agents** runs the same flow agentically: an **OpenBMB MiniCPM** planner (second local
+llama-server, OpenAI-compatible) consumes the Space's own **MCP tool surface** —
+`extract_events → check_conflicts → make_ics` — through smolagents, so the pipeline above is
+demonstrated as multi-step tool use over the public tool contract, with the planner's trace on
+screen (`server/orchestrator.py`). Stub/CI falls back to a scripted planner so the tab always works.
+---
+## Components
+### 1. Mac-side iMessage collector  (`collector/collector.py`)
+- **Reuse, don't reinvent the DB parsing.** Modern macOS stores message text in the
+  `attributedBody` (NSAttributedString) blob, not always the `text` column. Use the battle-tested
+  **`imessage-exporter`** (ReagentX, Rust) or the Python **`imessage_reader`** lib rather than
+  hand-rolling SQL. If hand-querying: join `message` ⨝ `handle` ⨝ `chat_message_join` ⨝ `chat`,
+  track last seen `ROWID`, poll on an interval.
+- Requires **Full Disk Access** for the running process (System Settings → Privacy & Security).
+- Sends only new messages to the Space `/ingest` endpoint over HTTPS with a shared bearer token.
+- Config: which chats to watch, poll interval, Space URL, token (`.env`, never committed).
+### 2. HF Space backend  (`app.py`, `server/`)
+- **Docker SDK Space** (`README.md` frontmatter: `sdk: docker`, `app_port: 7860`).
+- **llama.cpp** loads the fine-tuned GGUF and serves chat completions — satisfies
+  *Llama Champion*; no external AI call satisfies *Off the Grid*.
+- **Agent orchestrator** (`server/orchestrator.py`): the **OpenBMB MiniCPM** planner behind
+  **Run the agents** (its own llama-server) drives the Space's MCP tools as a multi-step agent — the OpenBMB
+  per-track prize case, and the same extraction pipeline exercised through the public tool
+  contract rather than private imports.
+- `/ingest` (FastAPI, mounted alongside Gradio) receives messages, runs the extraction prompt,
+  returns candidate events; results surface in the Gradio UI for review.
+- **Compute:** Q4_K_M GGUF of a 31B ≈ 18–20 GB → does **not** fit the free CPU tier (16 GB / 2 cores).
+  Serve on a GPU: **ZeroGPU** (free, H200/70 GB — but cold GGUF load per acquisition; document the
+  caveat) or a **paid GPU Space** (e.g. L4/L40S) for a smooth always-warm demo. See Fallback.
+### 3. Fine-tuning pipeline  (`training/`)
+- **Task:** conversation snippet → strict JSON list of events
+  `{title, start, end, location, attendees, reminder_minutes, notes}`.
+- **Data:** build a synthetic instruction dataset (~500–2000 examples) of realistic chat threads
+  paired with the target JSON. Generation/augmentation for *training data* can use any tooling
+  offline — the "no cloud API" rule applies to the *running app's inference*, not dataset prep.
+  Include hard cases: relative dates ("next Thurs"), ranges, no-event chitchat (empty list),
+  timezones, multiple events per thread.
+- **Method:** QLoRA via **Unsloth** (Qwen3-0.6B GRPO experience applies), 4-bit, r=16,
+  1–3 epochs. 31B QLoRA needs an A100/H100 80 GB (Colab Pro+/RunPod/Lambda, ~hours).
+- **Export:** merge LoRA → `convert_hf_to_gguf.py` (llama.cpp) → `llama-quantize` to Q4_K_M →
+  **publish GGUF to your HF repo** (satisfies *Well-Tuned*). Space downloads it at startup via
+  `huggingface_hub`.
+### 4. Custom Gradio UI  (`ui/`, `static/`)  — *Off-Brand*
+- `gr.Blocks` with a custom `gr.themes.Base(...)` palette + injected `css=` (custom fonts, layout,
+  cards) to push well past the default look.
+- Screens: connection/status, incoming-message feed, **review queue** (edit candidate events
+  inline, approve/reject), download `.ics`, optional "Push to Google Calendar" toggle, settings.
+### 5. Calendar output  (`calendar_out/`)
+- **`.ics` (default, off-grid):** generate with the `icalendar` lib; offer as a download in the UI.
+- **Google Calendar (optional bonus):** `google-api-python-client` OAuth; behind a toggle so the
+  off-grid demo path stays pure. Clearly labeled as the one optional cloud touchpoint.
+---
+## Hackathon requirement mapping
+| Track | How it's satisfied |
+|---|---|
+| Off the Grid (local-first, no cloud AI APIs) | All inference is local llama.cpp in the Space; data originates on your Mac; `.ics` is the default output. |
+| Well-Tuned (fine-tuned model on HF) | QLoRA fine-tune of `gemma-4-31B-it`, GGUF published to your HF repo. |
+| Off-Brand (custom UI) | Custom Gradio theme + CSS, not the stock look. |
+| Llama Champion (llama.cpp) | Inference via `llama-cpp-python`. |
+| Gradio app on HF Space | Docker Space serving Gradio + FastAPI `/ingest`. |
+---
+## Build phases
+1. **Hero path (no GPU):** Docker Space with custom-themed Gradio + the *stub* extractor → paste /
+   "Try a sample" / screenshot → review → `.ics` download, working end-to-end on a phone browser.
+   This is the parent's whole experience and must stand alone with no model.
+2. **Measure impact:** persisted **This week** panel (events captured, conflicts caught, minutes
+   saved) via `server/impact.py`, recorded when the parent exports. Proves *their* day got better.
+3. **Accuracy upgrade (optional):** wire `llama-cpp-python` with a community `gemma-4-31B-it` GGUF on
+   a GPU Space; swap the stub for the model + JSON-schema prompt + pydantic validation.
+4. **Fine-tune (optional):** dataset → Unsloth QLoRA → GGUF → publish to HF → point the Space at it.
+5. **Optional auto-feed:** Mac `collector.py` reading `chat.db` → POST `/ingest` (power users only).
+---
+## Verification
+- **End-to-end (stub, phase 1):** open Space in phone browser → tap **Try a sample** (or paste a
+  chat) → event appears in review queue → download `.ics` → import to a calendar, confirm date/time.
+- **Impact (phase 2):** after exporting, **Activity → This week** shows events captured and time
+  saved > 0; restart the app (same `IMPACT_PATH`) and confirm the weekly numbers persist while the
+  live tiles reset. `minutes_saved` is a stated estimate (`IMPACT_MIN_PER_EVENT`=8,
+  `IMPACT_MIN_PER_CONFLICT`=15, env-overridable), not a measurement.
+- **Collector (phase 2):** send yourself a test iMessage ("lunch Tuesday 1pm") → confirm it reaches
+  `/ingest` and surfaces in the feed.
+- **Model (phase 3+):** curated eval set of chats with known expected events; measure JSON validity
+  rate + field accuracy (esp. relative-date resolution); confirm empty-list on non-event chats.
+- **llama.cpp:** confirm the Space logs show llama.cpp loading *your* GGUF, no external AI calls.
+---
+## Risks & fallbacks
+- **31B serving cost/latency.** Q4 31B needs a GPU; ZeroGPU has cold-load + quota friction, paid GPU
+  has cost. **Fallback:** fine-tune **Gemma 4 E4B** (edge variant) — runs on free CPU tier / fast on
+  small GPU, far cheaper to fine-tune, and arguably *more* on-theme for "local-first." Keep 31B as
+  the headline, E4B as the safety net for a reliable live demo.
+- **`chat.db` schema / `attributedBody`.** Mitigated by using `imessage-exporter`/`imessage_reader`.
+- **Full Disk Access** must be granted to the collector's process or reads return empty.
+- **Privacy:** the autonomous Mac-collector path sends messages to the Space (token-gated); the
+  hero phone-paste path keeps data client-side (calendar tokens live in the browser, nothing
+  persists server-side). The Space now lives in the public **`build-small-hackathon`** submission
+  org, so the *source* is public — but user data still never lands on the server.
+- **Relative-date accuracy** is the main quality risk — pass the current datetime into the prompt
+  and weight the dataset toward relative-date examples.

README.md ADDED Viewed

	@@ -0,0 +1,435 @@

+---
+title: OffGridSchedula
+emoji: 🗓️
+colorFrom: indigo
+colorTo: purple
+sdk: docker
+app_port: 7860
+pinned: false
+license: apache-2.0
+short_description: Local-first chat-to-calendar agent (Gemma-4 E4B + MiniCPM)
+tags:
+  - track:backyard
+  - sponsor:openbmb
+  - sponsor:modal
+  - achievement:offgrid
+  - achievement:welltuned
+  - achievement:offbrand
+  - achievement:llama
+  - achievement:sharing
+  - achievement:fieldnotes
+models:
+  - build-small-hackathon/gemma-4-cal-gguf
+  - openbmb/MiniCPM5-1B-GGUF
+demo_video:
+  - https://youtu.be/m-o0u9X3tI4
+social_posts:
+  - https://x.com/nate_mauer/status/2065973341651882386
+  - https://x.com/nate_mauer/status/2064920352845709419
+  - https://x.com/nate_mauer/status/2065661878441750916
+  - https://www.linkedin.com/feed/update/urn:li:ugcPost:7471440639969132545
+blog_post:
+  - https://huggingface.co/blog/build-small-hackathon/offgridschedula
+made_by:
+  - ParetoOptimal - a.k.a., Nate Mauer
+---
+# 🗓️ Message Scheduling Agent
+ **OffGridSchedula turns a pasted chat (or a flyer screenshot) into calendar events, catches conflicts, and drafts the reply — right from your phone, no app, no account,
+no setup. iOS allows neither background iMessage access nor a persistent on-device LLM server, so there's no autonomous on-device agent to install; instead,
+a foreground Shortcut ([docs/automations.md](./docs/automations.md)) hands a thread or screenshot to the agent in two taps (optionally using a remote model via `INFERENCE_BASE_URL`).**
+The model runs on **your own server or even on the phone itself** and not on a cloud AI service. Your chats aren't shipped off to a third-party AI to be read; agent reads your snippet in memory and
+discards it after replying. The run trace you can optionally share is a redacted, sent to the agent you control that turns it into ready-to-add calendar events.
+**Hardware-aware.** With under-powered hardware, the app warns users with an upgrade banner rather than hanging, the real model needs a tiny GPU.
+## Build Small submission — the idea & the tech
+**The idea.** A busy parent's calendar lives in other people's messages — picture day in the
+class chat, the practice that moved, the party flyer. OffGridSchedula turns those into calendar
+events: paste the chat (or snap the flyer) from a phone browser, review the extracted events, the
+conflicts against your own `.ics`, and a drafted reply — then add to Apple/Google Calendar in a tap.
+**The tech.** Two small local models do the work. Extraction is [`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf)
+(~4B effective params), our QLoRA fine-tune of Gemma-4 E4B that emits a single validated
+**ActionPlan** (events · conflicts · reply · clarifying question), served with **vision** through
+the official **llama.cpp** server inside this Docker Gradio Space — no cloud AI APIs. The
+fine-tune + its 60-example task eval ran entirely on **Modal** serverless GPUs, behind an
+eval gate that rejected eight regressed models before this one shipped. Conflict math is
+deterministic Python, the UI is fully custom, the agent doubles as an **MCP tool server**, and
+redacted run traces are public on the [Hub](https://huggingface.co/datasets/ParetoOptimal/offgridschedula-traces).
+Click **Run the agents** and a local **OpenBMB MiniCPM** planner (a second local llama-server)
+drives this same Space's MCP tools as a multi-step agent — extract → check conflicts → render
+`.ics` — with every step visible. Still zero cloud AI; every model under 32B.
+**What's new.** Extraction now reads the *logistics*, not just the date (see below): arrival-aware
+start times, duration→end conversion, type-based reminders, and calendar-ready titles — each
+guaranteed by deterministic post-processing even when the model wobbles, and each shipped through
+a measured A/B eval ([full result tables](./training/data/ab_results.md): regex vs text-LLM vs
+**vision-LLM reading rendered screenshots only**). Calendar out got one-click too: a unified
+**Connect your calendar** block (Google OAuth — the token lives in *your* browser, never on the
+server; Outlook/Apple need no sign-in) and per-event **Google · Outlook · iCal** links, with the
+Google push verified end-to-end (push → readback → delete, 11/11).
+**The UX.** One decision — **Offline or Online** — re-themes the whole workflow card and sets the
+path: off-grid `.ics` only, or a **one-click "Connect your calendar"** whose Google OAuth token
+lives *only in the browser* (server-verified each visit; the client secret never leaves the
+server). Results land in a single card: events, conflicts, the drafted reply, and per-event
+**Google · Outlook · iCal · .ics** quick-add links. **Activity → This week** tallies events
+captured, conflicts caught, and time saved; a per-device **Memory** (localStorage, one-click
+samples) feeds names and preferences back into extraction.
+**Submission links:** [requirement-by-requirement mapping](./docs/build-small-submission.md) ·
+[demo video](https://youtu.be/m-o0u9X3tI4) ·
+social posts [1](https://x.com/nate_mauer/status/2064920352845709419) ·
+[2](https://x.com/nate_mauer/status/2065661878441750916)
+## Who this is for
+A busy parent whose kid's school and activity events are buried in a noisy class group chat —
+picture day Thursday, the practice that moved to Tuesday, the birthday-party RSVP. They read it once,
+mean to add it later, and miss it. With this, they **paste the chat** (or a **screenshot** of a flyer
+or invite) from their phone's browser and get back: the events, a **conflict check** against their
+calendar, and a **ready-to-send reply** — all surfaced for review before anything is saved. Output is
+a local `.ics` they can add to any calendar, with optional Google Calendar push.
+No app to install and no account. It reads nothing automatically — the parent pastes only what they
+choose. Inference runs **in the Space** via `llama.cpp` (no cloud AI APIs), and works out of the box
+with no GPU (see *Accuracy upgrade* below).
+## The model: `gemma-cal` E4B — one calendar-native LLM, built for exactly this
+What makes this platform different isn't a prompt wrapped around a generic chatbot — it's
+**[`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf), our own fine-tune of
+Gemma-4 E4B purpose-built for one job: turning messy human conversation into calendar-ready
+structure.** The model doesn't chat. It reads a thread (or a flyer photo) and emits a single
+validated **ActionPlan** — events with exact ISO datetimes, conflicts, proposed alternatives, a
+drafted reply, and a clarifying question when the plan is too vague to schedule. **It is the one
+and only model the platform runs**, everywhere from the production Space to a laptop.
+- **Edge-sized by design.** ~5 GB at Q4 — serves on a **~$0.40/hr 16 GB T4** (vs $4+/hr A100-class
+  for big models), a gaming GPU, or an Apple-silicon laptop, with full **vision**
+  (screenshots/flyers) via its mmproj. Local-first isn't a tagline; it's the parameter count.
+- **Schema-bulletproof.** The fine-tune holds **100% schema validity even with no system prompt**,
+  with stronger no-event discipline (doesn't invent events from "thanks!") and a higher rate of
+  *asking* when a date is TBD — the failure modes that actually burn users of generic models.
+- **Convention-trained.** It learns *this product's* date semantics ("next Tuesday" means next
+  week's Tuesday; weekday-anchored relative dates) instead of whatever a base model absorbed
+  from the internet.
+- **Eval-gated, never vibes-shipped.** Every retrain runs a 60-example task eval (start-exact
+  datetime matching, F1, validity, clarification) and **cannot reach production unless it clears
+  the gate** — the pipeline has rejected eight regressed models to date. The full, honest scorecard
+  lives in [`docs/eval-roadmap.md`](./docs/eval-roadmap.md) and the
+  [post-mortem write-up](./docs/blog-eval-gated-finetuning.md).
+**Hackathon size constraint (≤ 32B):** easily — E4B is ~4B effective parameters. See the in-app
+**🏆 Submission** tab for the full compliance scorecard.
+### Reads the logistics, not just the date
+A confirmation like *"Time: 10:30 AM · Duration: approx. 30–45 min · (Please arrive 15 minutes
+early to complete intake forms) · 📍 112A West 72nd Street…"* becomes one correct event:
+- **Arrival-aware start** — the event starts at **10:15** (when you must show up), the official
+  10:30 is preserved in the notes, and the **end is anchored to the stated time + duration**
+  (11:00), so the calendar block covers the forms *and* the visit.
+- **Type-based notifications** — an explicitly stated lead time always wins ("remind me 2 hours
+  before" → 120); otherwise doctor/medical visits get 60 minutes, parties 30, carpools and school
+  events 45.
+- **Real-world addresses** — multi-line and 📍-emoji locations join into one string;
+  "(Upper West Side — 72nd & Columbus)" glosses and SMS footers ("Reply C to confirm… call us
+  at 212-223-0349") don't confuse it.
+- **Calendar-ready titles** — an action+subject summary ("Pick up Priya — Terminal 4"), not a
+  quote of the message.
+The model is *taught* these conventions (prompt + fine-tune data), but the load-bearing ones are
+also **guaranteed by deterministic post-processing** (`apply_text_rules` in
+[`server/agent.py`](./server/agent.py)) — same philosophy as the conflict engine: must-hold
+logistics are never left to model temperament. Every behavior above shipped through a measured
+A/B eval — regex baseline vs text-LLM vs **vision-LLM reading rendered chat screenshots only** —
+with the full tables in [`training/data/ab_results.md`](./training/data/ab_results.md)
+(headline: text-LLM event F1 0.96 structured / 0.89 unstructured vs regex 0.60/0.67; the
+screenshot-only vision arm lands within a point of text).
+## Try it in 30 seconds
+Open the Space in your phone's browser → **Schedule** tab → tap **Try a sample** (or paste your own
+group chat, and optionally a screenshot or your `.ics`) → review the detected events → **Download
+.ics**. The **Activity → This week** panel then shows what you've captured and the time it saved.
+## How it works
+```
+Paste a thread / screenshot ──▶ HF Space ──▶ llama.cpp ──▶ events + conflicts + reply
+   (phone browser)                  │                              │
+                              custom Gradio UI ◀── review ──┐  ┌────┘
+                                                            ▼  ▼
+                                          .ics download / optional Google Calendar
+```
+The **primary path needs nothing but a browser**: paste text and/or attach a screenshot in the
+Schedule tab. (Power users can also auto-feed messages from a Mac — see *Optional: Mac collector*.)
+For the full solution-architecture view — every workflow and which LLM (if any) it calls,
+plus the eval-gated fine-tuning loop — see **[docs/architecture.md](./docs/architecture.md)**.
+## Can it process multiple invites at once?
+**Yes — multiple invites in one paste is the designed path** (on the live Space, where the real
+model runs). `ActionPlan.events` is a *list*, and the extraction prompt explicitly tells the model
+that one thread often holds several events — a drop-off AND a pickup, or two appointments, are
+separate events (`server/agent.py`). Everything downstream is built for N events: the results card
+shows "*N events found*" with one card per invite, the editable table gets one row each, the `.ics`
+contains one `VEVENT` per event, each event carries its own Google/Outlook/Apple quick-add links,
+and the conflict check runs across all of them. Screenshot input is multi-file too — attach several
+flyers and they're all read in one run.
+Two caveats:
+- **Stub mode extracts only the first invite.** The local-dev heuristic (`_stub_plan` in
+  `server/agent.py`, enabled by `USE_STUB_EXTRACTOR=1`) works with no model and no GPU — and it's
+  now a decent parser in its own right (labeled times, explicit dates, multi-line/📍 locations,
+  durations, arrival-early shifts, type-based reminders) — but it still returns at most **one**
+  event. If you paste a multi-invite thread locally and get one event back, that's the stub, not
+  the product; the deployed Space uses the multi-event model path.
+- **Simultaneous *runs* are serialized, not parallel.** If two users (or two tabs) hit *Run the
+  agents* at once, both complete, but inference executes one request at a time — `server/model.py`
+  holds the llama.cpp instance behind a `threading.Lock`, and Gradio queues the events. On a
+  single-GPU Space that's intentional (one model copy in memory); the second run simply waits its
+  turn, then streams its own pipeline progress.
+## Repo layout
+```
+app.py                 # Gradio + FastAPI entrypoint (the Space)
+server/
+  agent.py             # thread (+images) -> validated ActionPlan
+  orchestrator.py      # Run the agents: MiniCPM planner driving our own MCP tools
+  schema.py            # Event / Conflict / ActionPlan pydantic models
+  model.py             # llama.cpp load: GGUF + vision mmproj, constrained JSON
+  imageutil.py         # image -> base64 data URI
+ui/blocks.py           # custom Gradio Blocks (reasoning, events, conflicts, reply)
+static/app.css         # custom CSS (Off-Brand)
+calendar_out/
+  ics.py               # .ics generation (off-grid default)
+  freebusy.py          # parse existing .ics + deterministic conflict detection
+  gcal.py              # optional Google Calendar push
+collector/collector.py # Mac-side iMessage collector (text + image attachments)
+training/              # dataset build + QLoRA fine-tune + GGUF/mmproj export
+Dockerfile             # dedicated-GPU Space: builds llama.cpp (0.3.28) WITH CUDA
+requirements-docker.txt # runtime deps for the Docker image (llama.cpp built separately)
+PLAN.md                # full design + build plan
+```
+## Quick start (local dev) — no GPU needed
+```bash
+pip install -r requirements.txt
+# Runs the whole app with the built-in heuristic agent — no model, no GPU:
+export USE_STUB_EXTRACTOR=1 INGEST_TOKEN="dev-secret"
+python app.py            # http://localhost:7860
+```
+Open it, go to the **Schedule** tab, and tap **Try a sample** — or paste a thread, attach chat
+**screenshots**, and optionally upload your current calendar **`.ics`** for conflict checks.
+(Heads-up: the stub agent extracts only the **first** invite in a thread — multi-invite extraction
+needs the real model; see *Can it process multiple invites at once?* above.) Tip for
+self-hosted installs: set `CAL_ICS_PATH=/path/to/calendar.ics` and conflict checks use that file
+automatically whenever no `.ics` is uploaded — step 4 completes itself, fully offline. Review
+the detected events, conflicts, proposed times, and the suggested reply, then add any event with
+its **Add to: Google · Outlook · iCal · .ics** links (iCal and .ics both download the event's
+`.ics` file; with 2+ events an **iCal ��� all N events** link grabs everything at once).
+The **Activity → This week** panel shows what you've captured.
+## This week (impact)
+The Activity tab has a **This week** panel that persists across restarts: **events captured**,
+**conflicts caught**, and **estimated time saved**. A "capture" is counted when a run surfaces
+events for review (adding to a calendar happens through the per-event links, which the server
+can't observe).
+`minutes_saved` is a deliberately conservative, **configurable estimate — not a measurement**:
+`IMPACT_MIN_PER_EVENT` (default **8** min per captured event) + `IMPACT_MIN_PER_CONFLICT` (default
+**15** min per conflict caught). Override either via env. State persists to `IMPACT_PATH`
+(default `/tmp/impact_weeks.json`; point it at a persistent disk on a Space to survive rebuilds).
+## Accuracy upgrade (optional) — serve the real `gemma-cal` LLM
+The stub agent above makes the demo work with **no GPU**. The production Space serves our
+fine-tuned **`gemma-cal` E4B** through `llama-server` — no cloud AI APIs either way. The same
+config works anywhere llama.cpp runs:
+```bash
+export USE_STUB_EXTRACTOR=0
+export MODEL_HF_REPO="build-small-hackathon/gemma-4-cal-gguf"
+export MODEL_FILE="gemma-cal-e4b-Q4_K_M.gguf"     # ~5 GB edge fine-tune (what the Space serves)
+export MMPROJ_REPO="unsloth/gemma-4-E4B-it-GGUF"  # the E4B's own vision projector
+export MMPROJ_FILE="mmproj-F16.gguf"              # enables screenshot/vision input
+bash scripts/start_space.sh
+```
+This is the platform's **only** model — the same ~5 GB GGUF serves the production Space (16 GB
+T4), a gaming GPU, or a laptop. (`MODEL_FILE` is explicit on purpose: the model repo also stores
+legacy training artifacts, so the `-hf repo:Q4_K_M` shorthand is ambiguous.)
+## Optional: Mac collector (power users)
+The phone-paste path above needs nothing installed. If you'd rather have new iMessages fed in
+automatically, run the collector on a Mac where iMessages sync (iOS exposes no API for message
+content, so a Mac is the only auto-feed source):
+```bash
+cd collector && cp .env.example .env   # edit SPACE_URL + INGEST_TOKEN
+python collector.py
+```
+> ⚠️ The collector needs **Full Disk Access** (System Settings → Privacy & Security) to read `chat.db`.
+## Autonomous & on a phone
+There's a single backend endpoint — **`POST /agent`** (bearer `INGEST_TOKEN`) — that takes a thread
+(or messages, + optional screenshot/`.ics`) and returns the extracted events, conflicts, and reply as
+JSON (optionally an `.ics` or a Google Calendar push). Every front-end calls it:
+- **Fully autonomous (Mac) — set-and-forget:** `INGEST_TOKEN=… MODEL_GGUF=~/models/hermes.gguf
+  scripts/setup_mac.sh` installs three launchd jobs (Hermes `llama-server` + autonomous backend +
+  collector). New iMessages **you send or accept** become calendar events automatically, deduped per
+  chat. Triggers on outgoing messages by default (`TRIGGER_ON=outgoing`; `any` to widen).
+- **Hermes "grows-with-you" brain:** point `INFERENCE_BASE_URL` at a Hermes `llama-server`; its
+  personal **memory** (people→roles, "you decline Mondays") improves extraction over time and is shown
+  in the dashboard **Memory** tab. See **[docs/hermes.md](./docs/hermes.md)**.
+- **iPhone, one tap:** an iOS **Shortcut** shares a thread/screenshot to `/agent` and adds the events
+  to Apple Calendar natively — no `.ics` import.
+- **Android, hands-off:** a Tasker/MacroDroid rule on a notification/SMS calls `/agent` and inserts
+  events. See **[docs/android-tasker.md](./docs/android-tasker.md)**.
+- **On-device model:** set `INFERENCE_BASE_URL` to a local `llama-server` (e.g. Gemma **E4B** or a
+  small Hermes in Termux) so inference runs *on the phone* — same agent, env-selected.
+> **iOS can't read iMessage in the background** (no message API), so fully-autonomous iMessage needs
+> the Mac collector; the iPhone path is one-gesture. See **[docs/automations.md](./docs/automations.md)**
+> and **[docs/on-device.md](./docs/on-device.md)**.
+## Build Small — prizes & quests
+**Track: 🏡 Backyard AI** (`track:backyard`) — a practical app for a specific real person: a busy
+parent whose family calendar is buried in a noisy class group chat.
+### Sponsor awards we compete for
+| Award | Why this submission qualifies |
+|---|---|
+| 🟢 **Modal Awards** (best Modal-powered apps) | **Modal powered the development of the platform's model end-to-end** — required note, gladly given: [`training/modal_train.py`](./training/modal_train.py) (QLoRA fine-tune on serverless A100/H100s, Volumes caching weights), [`training/modal_eval.py`](./training/modal_eval.py) + [`modal_quant_eval.py`](./training/modal_quant_eval.py) (the task eval served on llama.cpp inside Modal, incl. an f16/Q8_0/Q4_K_M quantization study and the regex/text/vision A/B harness), and [`training/gated_retrain.py`](./training/gated_retrain.py) (train → staging → eval → promote *only past the gate* — eight regressed models rejected, every run a Modal job). |
+| 🌱 **OpenBMB Awards** (standout MiniCPM builds, per track) | The **agent is planned by OpenBMB MiniCPM** (`openbmb/MiniCPM4.1-8B-GGUF`, Q4; the 1B variant is a config switch) on a second local llama-server, driving this Space's own MCP tools (`extract_events → check_conflicts → make_ics`) as a visible multi-step agent ([`server/orchestrator.py`](./server/orchestrator.py)). MiniCPM is the agent's brain, not a garnish. |
+*(Not claimed: the OpenAI Track — no Codex-attributed commits — and the NVIDIA Nemotron Quest —
+different model family. We'd rather be honest than eligible.)*
+### Special awards — our case
+| Award | Our case |
+|---|---|
+| 🎖️ **Bonus Quest Champion** | All **six** collectable quests claimed with evidence — the full sash (table below). |
+| 🎨 **Off-Brand Award** | Custom landing page, hero + carousel, grouped nav, bespoke results cards and Activity dashboard — [`ui/blocks.py`](./ui/blocks.py) + [`static/app.css`](./static/app.css), far past the stock Gradio look. |
+| 🐜 **Tiny Titan** | The platform's one and only model is **Gemma E4B — ~4B *effective* parameters** (~5 GB at Q4, serves on a 16 GB T4 or a laptop), and a 1B MiniCPM planner variant is a config switch. Honest framing: E4B is a MatFormer "effective-4B" — judges' call whether that's tiny enough. |
+| 🎬 **Best Demo** | App + demo video + social post as one package — storyboard with every quest named on-camera in [`docs/demo-script.md`](./docs/demo-script.md). |
+| 🤖 **Best Agent** | The MiniCPM-planned, MCP-tool-driven agent above — real multi-step tool use, every model under the 32B cap. |
+| 🃏 **Judges' Wildcard** | No entry needed — but if "eval-gated fine-tuning with a public failure post-mortem" fits no category, we know where to find you. |
+### Collectable quests — all six claimed
+| Quest | Evidence |
+|---|---|
+| 🔌 **Off the Grid** (local-first, no cloud APIs) | All inference is llama.cpp inside the Space; the only optional outbound call is the user's own Google Calendar push. |
+| 🎯 **Well-Tuned** (published fine-tune) | [`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf) — our QLoRA fine-tune **is the model production serves**, shipped through the eval gate with the [honest scorecard public](./docs/eval-roadmap.md). |
+| 🎨 **Off-Brand** (custom UI) | See the Off-Brand Award case above. |
+| 🦙 **Llama Champion** (llama.cpp runtime) | The official `ghcr.io/ggml-org/llama.cpp` server image runs the GGUF + vision mmproj ([`Dockerfile`](./Dockerfile), [`scripts/start_space.sh`](./scripts/start_space.sh)). |
+| 📡 **Sharing is Caring** (open trace on the Hub) | Redacted agent traces published to [`ParetoOptimal/offgridschedula-traces`](https://huggingface.co/datasets/ParetoOptimal/offgridschedula-traces) — one click from the Activity tab. |
+| 📓 **Field Notes** (write-up) | [`FIELD_NOTES.md`](./FIELD_NOTES.md) + the [eval-gated fine-tuning post-mortem](./docs/blog-eval-gated-finetuning.md) + [project blog](https://huggingface.co/blog/build-small-hackathon/offgridschedula). |
+## Fine-tune on Modal (GPU)
+`training/modal_train.py` runs the whole fine-tune on a serverless GPU and publishes the GGUF to
+HF — no local GPU needed. It's a thin wrapper that ships this repo to Modal and runs the existing
+pipeline (`make_dataset.py` → `train_qlora.py` → `export_gguf.sh`) on an A100/H100, then uploads the
+quantized GGUF + `mmproj` to your HF repo. This is all *offline* prep, so **Off the Grid** is
+untouched (the rule applies to the running app's inference, not dataset/training prep).
+```bash
+pip install modal
+modal token new
+modal secret create huggingface HF_TOKEN=hf_xxxxxxxx     # your HF *write* token
+# Validate the full pipeline cheaply first (cheap edge model, ~a couple $):
+modal run training/modal_train.py --base-model google/gemma-4-E4B-it
+# Then the real run (default A100-80GB; --gpu H100 for speed):
+modal run training/modal_train.py
+modal run training/modal_train.py --gpu H100 --num-epochs 3
+```
+On finish it prints the `MODEL_REPO` / `MODEL_FILE` / `MMPROJ_FILE` to set on the Space. Two
+persistent Modal Volumes cache the base-model download and the outputs across runs, so iterating on
+`training/data/dataset.jsonl` only re-pays for the training itself.
+> Cost (A100-80GB ≈ $2.5/hr, per-second billing): a few-hundred-to-2000-example QLoRA run is
+> ~1–3 hr ≈ $5–15, so ~$250 of credit ≈ 15–40 full iterations. Expand the dataset before the
+> first real 31B run — the seeds in `make_dataset.py` are a smoke test, not a training set.
+### Publish your fine-tune & point the Space at it
+The training run is the one step that spends **your** GPU/Modal credits — it's not done for you.
+Once you've run it, the path is turnkey:
+1. **Recommended:** `python training/gated_retrain.py` — train → staging upload → 60-example eval →
+   **promote only if it beats the gate**. A regressed model cannot reach production. (Raw
+   `modal run training/modal_train.py` is the ungated equivalent for experiments.)
+2. Point the Space at *your* model via **Space variables** (`scripts/start_space.sh` reads them at
+   launch; set in *Settings → Variables* or with `HfApi().add_space_variable`):
+   ```
+   MODEL_HF_REPO = <you>/gemma-cal-gguf
+   MODEL_FILE    = gemma-cal-e4b-Q4_K_M.gguf   # explicit file — repo may hold several quants/tiers
+   MMPROJ_REPO   = unsloth/gemma-4-E4B-it-GGUF # projector repo, if different from the LLM's
+   MMPROJ_FILE   = mmproj-F16.gguf             # enables screenshot/vision input
+   ```
+   The deploy workflow stays a plain git mirror — the model is pulled at runtime, never committed.
+3. Push to `main` → CI deploys → the Space now serves your fine-tune (**Well-Tuned**).
+## Share a trace (Sharing is Caring)
+Want others to learn from a run? In the **Activity** tab, click **⬇ Download trace (JSON)** — the
+trace stays on your device, and the hosted Space holds **no Hub token**. Personal data is redacted by
+default (the activity log only carries counts + status; the one chat-name field is stripped). Then
+publish it from your own machine, with your own login:
+```bash
+huggingface-cli login                                   # or export HF_TOKEN=...
+python training/share_trace.py trace.json --public      # -> a HF dataset repo of traces
+```
+## Field notes
+[**FIELD_NOTES.md**](./FIELD_NOTES.md) is the build retrospective — the iOS→`chat.db` pivot, the
+`attributedBody` trap, why conflict math is deterministic, stub-first architecture, the
+reframe-around-one-person lesson, and the Off-the-Grid trade-offs.
+## Remote automation (runs without an interactive session)
+| Workflow | Trigger | What it does | Needs |
+|---|---|---|---|
+| `.github/workflows/ci.yml` → **test** | push / PR | compile + `pytest` (stub mode, no GPU) | nothing |
+| `.github/workflows/ci.yml` → **deploy** | push to `main`, after tests pass | `huggingface-cli upload` the repo to the HF Space (Gradio SDK; model excluded, pulled at runtime) | secret `HF_TOKEN`, var `SPACE_ID` |
+| `.github/workflows/maintenance.yml` | daily + manual | ping the Space `/health`, audit outdated deps → open/update a GitHub issue | var `SPACE_HEALTH_URL` |
+One-time setup for deploy + monitoring:
+```bash
+gh secret set HF_TOKEN                       # HF write token
+gh variable set SPACE_ID -b "<owner>/<space>"
+gh variable set SPACE_HEALTH_URL -b "https://<owner>-<space>.hf.space/health"
+```
+CI installs `requirements-ci.txt` (excludes `llama-cpp-python` and the Google libs — both are
+imported lazily and not needed for the stub-mode tests). A weekly Claude `/schedule` routine handles
+the judgment work (grow `training/data/dataset.jsonl` → PR, triage CI failures).

app.py ADDED Viewed

	@@ -0,0 +1,298 @@

+"""Space entrypoint: Gradio UI + FastAPI /ingest, served together on one port."""
+from __future__ import annotations
+import json
+import os
+from pathlib import Path
+import gradio as gr
+import uvicorn
+from fastapi import BackgroundTasks, FastAPI, Header, HTTPException, Request
+from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
+from pydantic import BaseModel
+from server import dedup, events, health, threads
+from server.pipeline import AgentRequest, AgentResponse, run_pipeline
+from ui.blocks import CAROUSEL_JS, CSS, THEME, build_demo
+INGEST_TOKEN = os.environ.get("INGEST_TOKEN", "")
+FEED_PATH = Path(os.environ.get("FEED_PATH", "/tmp/ingest_feed.json"))
+MAX_FEED = 200
+# Opt-in: run the agent automatically on each new message (front-end A). Off by
+# default, so /ingest keeps its store-only behavior unless explicitly enabled.
+AUTONOMOUS = os.environ.get("AUTONOMOUS") == "1"
+# Which message direction triggers autonomous action: "outgoing" = only when YOU
+# send/accept an invite (is_from_me), "any" = any new message in the chat.
+TRIGGER_ON = os.environ.get("TRIGGER_ON", "outgoing").lower()
+app = FastAPI(title="iMessage Calendar Agent")
+class IngestMessage(BaseModel):
+    chat: str
+    sender: str
+    text: str
+    timestamp: str
+    images: list[str] = []  # base64 data URIs of image attachments
+    is_from_me: bool = False  # True when YOU sent it (the send/accept trigger)
+class IngestBatch(BaseModel):
+    messages: list[IngestMessage]
+def _load_feed() -> list[dict]:
+    try:
+        return json.loads(FEED_PATH.read_text())
+    except Exception:  # noqa: BLE001  missing/corrupt -> empty
+        return []
+def _append_feed(items: list[dict]) -> None:
+    feed = (_load_feed() + items)[-MAX_FEED:]
+    FEED_PATH.write_text(json.dumps(feed, indent=2))
+def _require_token(authorization: str) -> None:
+    if not INGEST_TOKEN or authorization != f"Bearer {INGEST_TOKEN}":
+        raise HTTPException(status_code=401, detail="bad token")
+def _run_autonomous(chats: set[str]) -> None:
+    """For each affected chat, run the agent over its rolling thread and deliver
+    only the genuinely-new events (deduped). Used when AUTONOMOUS=1.
+    Order matters: extract WITHOUT pushing, dedup, then push only the fresh
+    events. (Pushing inside the pipeline re-pushed already-captured events on
+    every rolling-window re-run — the exact duplicate-creation dedup exists to
+    prevent.)"""
+    feed = _load_feed()
+    for chat in chats:
+        thread = threads.rolling_thread(feed, chat)
+        if not thread:
+            continue
+        resp = run_pipeline(AgentRequest(thread=thread, push_gcal=False))
+        # Filter WITHOUT recording: events are only marked seen once the push
+        # actually succeeds — recording first turns any transient push failure
+        # into silent, permanent event loss (filtered out on every retry).
+        new_events = dedup.filter_new(resp.plan.events, record=False)
+        if not new_events:
+            continue
+        try:
+            from calendar_out.gcal import push_events  # lazy: google libs optional
+            push_events(new_events)
+        except Exception as e:  # noqa: BLE001  push failure must not kill the loop
+            events.emit("calendar",
+                        f"autonomous push failed (will retry next run): "
+                        f"{type(e).__name__}: {e}",
+                        level="error")
+            continue  # NOT marked seen -> retried on the next trigger
+        dedup.mark_seen(new_events)
+        events.emit(
+            "decision",
+            f"autonomous: {len(new_events)} new event(s) in {chat}",
+            events=len(new_events),
+        )
+@app.post("/agent", response_model=AgentResponse)
+def agent(req: AgentRequest, authorization: str = Header(default="")):
+    """Run the agent on a thread (or messages) and return an ActionPlan.
+    The shared contract every front-end calls (iOS Shortcut, Android Tasker, the
+    Mac collector). Stateless — see server/pipeline.run_pipeline.
+    """
+    _require_token(authorization)
+    return run_pipeline(req)
+@app.post("/ingest")
+def ingest(batch: IngestBatch, background_tasks: BackgroundTasks,
+           authorization: str = Header(default="")):
+    """Receive new messages from the Mac collector (bearer-token protected).
+    Returns immediately — autonomous runs (full LLM inference, potentially
+    minutes per chat) happen in a background task. Running them inline blew
+    the collector's 30s POST timeout, which skipped _save_rowid and re-sent
+    the same batch every poll (duplicate feed entries + duplicate runs)."""
+    _require_token(authorization)
+    items = [m.model_dump() for m in batch.messages]
+    _append_feed(items)
+    n_imgs = sum(len(m.images) for m in batch.messages)
+    chats = sorted({m.chat for m in batch.messages})
+    events.emit("ingest", f"{len(items)} msg(s) from {', '.join(chats) or '—'}", images=n_imgs)
+    if AUTONOMOUS:
+        # Trigger on YOUR sent/accepted messages by default; "any" widens it.
+        if TRIGGER_ON == "any":
+            trigger_chats = set(chats)
+        else:
+            trigger_chats = {m.chat for m in batch.messages if m.is_from_me}
+        if trigger_chats:
+            background_tasks.add_task(_run_autonomous, trigger_chats)
+    return {"received": len(items)}
+@app.get("/health")
+def health_route():
+    # Liveness + hardware-adequacy (device/model/degraded/reason). The on-page
+    # status banner and the maintenance monitor both read this.
+    return health.health_status()
+# --- Per-user Google Calendar OAuth (web flow) ----------------------------- #
+def _oauth_redirect_uri(request: Request) -> str:
+    """Public redirect URI. On a Space, SPACE_HOST is the public host; locally,
+    fall back to the request's base URL. Must match the Google client config."""
+    host = os.environ.get("SPACE_HOST", "").strip()
+    base = f"https://{host}" if host else str(request.base_url).rstrip("/")
+    return base.rstrip("/") + "/oauth2callback"
+@app.get("/oauth2/start")
+def oauth2_start(request: Request):
+    """Kick off the Google consent flow (opened as a popup from the UI)."""
+    from calendar_out import gcal
+    try:
+        url, _state = gcal.auth_url(_oauth_redirect_uri(request))
+    except Exception as e:  # noqa: BLE001  not configured -> friendly page
+        return HTMLResponse(
+            f"<p style='font-family:sans-serif;padding:24px'>Google Calendar isn't "
+            f"configured on this Space.<br><small>{e}</small></p>",
+            status_code=503,
+        )
+    return RedirectResponse(url)
+@app.get("/oauth2callback")
+def oauth2_callback(request: Request):
+    """Google redirects here after consent. Exchange the code for a per-user token,
+    hand it to the opener window (and localStorage), then close. The token is NOT
+    stored server-side."""
+    code = request.query_params.get("code")
+    if request.query_params.get("error") or not code:
+        return HTMLResponse(
+            "<p style='font-family:sans-serif;padding:24px'>Google connection cancelled. "
+            "You can close this window.</p><script>setTimeout(()=>window.close(),500)</script>"
+        )
+    from calendar_out import gcal
+    try:
+        token_json = gcal.exchange_code(
+            _oauth_redirect_uri(request), code, request.query_params.get("state", "")
+        )
+    except Exception as e:  # noqa: BLE001
+        return HTMLResponse(
+            f"<p style='font-family:sans-serif;padding:24px'>Couldn't complete Google "
+            f"sign-in.<br><small>{e}</small></p>"
+        )
+    tok_js = json.dumps(token_json)  # JS string literal of the token JSON
+    return HTMLResponse(
+        "<!doctype html><meta charset=utf-8>"
+        "<body style='font-family:sans-serif;padding:24px'>"
+        "<p>✅ Google Calendar connected. You can close this window.</p>"
+        "<script>try{var t=" + tok_js + ";localStorage.setItem('gcal_token',t);"
+        "if(window.opener)window.opener.postMessage({gcal_token:t},location.origin);}"
+        "catch(e){}setTimeout(function(){window.close();},800);</script></body>"
+    )
+class TokenCheckBody(BaseModel):
+    token: str
+@app.post("/oauth2/check")
+def oauth2_check(body: TokenCheckBody):
+    """Liveness-check a browser-held Google token with one real API call
+    (same-origin fetch from wireGcal on page load). POST so the token never
+    lands in access logs; it is checked and discarded, never stored.
+    200 = definitive verdict; non-200 = indeterminate (client keeps its
+    local shape-check state)."""
+    from calendar_out import gcal
+    try:
+        gcal._client_config()  # mirror /oauth2/start: friendly 503 when env unset
+    except Exception as e:  # noqa: BLE001
+        return JSONResponse(
+            {"ok": False, "transient": True, "reason": str(e)}, status_code=503
+        )
+    res = gcal.check_token(body.token)
+    out: dict = {"ok": res["ok"]}
+    if res["ok"]:
+        if res.get("refreshed_token"):
+            out["token"] = res["refreshed_token"]
+    else:
+        out["reason"] = res.get("reason", "")
+        out["transient"] = bool(res.get("transient"))
+    return out
+# Register the @spaces.GPU functions at startup so ZeroGPU can schedule them.
+import server.model  # noqa: E402,F401
+demo = build_demo()
+# Serving mode, env-selected:
+# - "gradio": the HF *Gradio-SDK* / ZeroGPU platform manages the launch (a self-run
+#   uvicorn gets SIGTERM'd there), so we call demo.launch(). /agent etc. aren't served.
+# - "uvicorn": mount gradio under FastAPI and serve UI + /agent + /ingest on one port.
+#   Used locally and on the *Docker-SDK* GPU Space (Dockerfile sets SERVE=uvicorn).
+# Default: gradio on a Space unless told otherwise, uvicorn locally.
+_default_serve = "gradio" if (os.environ.get("SPACE_ID") or os.environ.get("SYSTEM") == "spaces") else "uvicorn"
+SERVE = os.environ.get("SERVE", _default_serve)
+# Gradio 6 applies theme/css at mount/launch time — the css set on gr.Blocks is
+# IGNORED when mounted, so pass it here or the custom UI renders as default Gradio.
+#
+# The `js=` load-function does NOT reliably execute on a *mounted* (uvicorn) app in
+# Gradio 6 — the carousel then sits on its first slide with dead arrows/dots. So we
+# inject the carousel script as a real inline <script> before </body> via middleware;
+# it self-bootstraps and its MutationObserver wires every .carousel once Gradio
+# client-renders the page. (The launch() path below still passes js= for ZeroGPU.)
+if SERVE == "uvicorn":
+    from starlette.responses import Response as _Response
+    _CAROUSEL_INLINE = f'<script id="cz-inline-js">({CAROUSEL_JS})();</script>'
+    # Status banner: fetch /health on load and reveal #status-banner if degraded
+    # (e.g. real model on CPU-only hardware). Same inline-script pattern as the
+    # carousel, since js= is unreliable on a mounted app; it polls for the element
+    # because Gradio renders it client-side after </body>.
+    _BANNER_JS = (
+        "(function(){fetch('/health').then(function(r){return r.json();})"
+        ".then(function(h){if(!h||!h.degraded){return;}(function s(){"
+        "var b=document.getElementById('status-banner');"
+        "if(!b){return setTimeout(s,400);}"
+        "b.textContent='\\u26a0\\ufe0f '+(h.reason||'This Space needs a GPU.')+' \\u26a0\\ufe0f';"
+        "b.style.display='block';})();}).catch(function(){});})();"
+    )
+    _BANNER_INLINE = f'<script id="cz-banner-js">{_BANNER_JS}</script>'
+    @app.middleware("http")
+    async def _inject_carousel_js(request, call_next):  # noqa: ANN001
+        resp = await call_next(request)
+        if request.url.path != "/" or "text/html" not in resp.headers.get("content-type", ""):
+            return resp
+        body = b"".join([chunk async for chunk in resp.body_iterator])
+        html = body.decode("utf-8", "ignore")
+        if "cz-inline-js" not in html and "</body>" in html:
+            html = html.replace("</body>", _CAROUSEL_INLINE + _BANNER_INLINE + "</body>", 1)
+        headers = dict(resp.headers)
+        headers.pop("content-length", None)  # body length changed; let Starlette recompute
+        return _Response(content=html, status_code=resp.status_code,
+                         headers=headers, media_type="text/html")
+    app = gr.mount_gradio_app(
+        app, demo, path="/", ssr_mode=False, theme=THEME, css=CSS, js=CAROUSEL_JS,
+        mcp_server=True,  # expose extract_events/make_ics/check_conflicts as MCP tools
+    )
+if __name__ == "__main__":
+    if SERVE == "gradio":
+        demo.launch(
+            server_name="0.0.0.0", server_port=7860, ssr_mode=False,
+            theme=THEME, css=CSS, js=CAROUSEL_JS,
+            mcp_server=True,  # expose extract_events/make_ics/check_conflicts as MCP tools
+        )
+    else:
+        uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("PORT", "7860")))

calendar_out/__init__.py ADDED Viewed

File without changes

calendar_out/freebusy.py ADDED Viewed

	@@ -0,0 +1,141 @@

+"""Conflict detection against the user's existing calendar.
+Off-grid by default: the user uploads a current-calendar .ics; we parse it into
+busy intervals and detect clashes deterministically (time math is more reliable
+in code than from the model). The model still writes the reasoning + reply.
+"""
+from __future__ import annotations
+from datetime import datetime, timedelta
+from typing import Optional
+from dateutil import parser as dtparser
+from icalendar import Calendar
+from pydantic import BaseModel
+from server import events as events_bus  # aliased: 'events' is a common param name here
+from server.schema import ActionPlan, Conflict, Event
+TIGHT_GAP = timedelta(minutes=30)
+DEFAULT_DURATION = timedelta(hours=1)
+class Busy(BaseModel):
+    start: datetime
+    end: datetime
+    title: str = ""
+def _naive_local(dt: datetime) -> datetime:
+    """Aware datetimes are CONVERTED before dropping tzinfo — blindly stripping
+    shifted UTC-exported .ics (Google's default) by the whole UTC offset against
+    the model's local-time events. Conversion target = the SAME configured zone
+    gcal labels pushed events with (calendar_out/tzconfig), else process-local,
+    so conflict math and calendar pushes share one time basis."""
+    if dt.tzinfo is not None:
+        from calendar_out.tzconfig import zone
+        dt = dt.astimezone(zone())  # None -> process-local
+    return dt.replace(tzinfo=None)
+def _as_dt(value) -> Optional[datetime]:
+    if value is None:
+        return None
+    if isinstance(value, datetime):
+        return _naive_local(value)
+    try:
+        return _naive_local(dtparser.isoparse(str(value)))
+    except (ValueError, TypeError):
+        return None
+def load_ics_busy(data: bytes) -> list[Busy]:
+    """Parse VEVENTs from an .ics into busy intervals (naive local datetimes)."""
+    busy: list[Busy] = []
+    cal = Calendar.from_ical(data)
+    for comp in cal.walk("VEVENT"):
+        start = _as_dt(getattr(comp.get("dtstart"), "dt", None))
+        if start is None:
+            continue
+        end = _as_dt(getattr(comp.get("dtend"), "dt", None)) or (start + DEFAULT_DURATION)
+        busy.append(Busy(start=start, end=end, title=str(comp.get("summary", ""))))
+    return busy
+def _event_interval(ev: Event) -> Optional[tuple[datetime, datetime]]:
+    start = _as_dt(ev.start)
+    if start is None:
+        return None
+    end = _as_dt(ev.end) or (start + DEFAULT_DURATION)
+    return start, end
+def _overlaps(a0, a1, b0, b1) -> bool:
+    return a0 < b1 and b0 < a1
+def _severity(a0, a1, b0, b1) -> Optional[str]:
+    if _overlaps(a0, a1, b0, b1):
+        return "overlap"
+    gap = b0 - a1 if b0 >= a1 else a0 - b1
+    if gap <= timedelta(0):
+        return "adjacent"
+    if gap < TIGHT_GAP:
+        return "tight"
+    return None
+def check_conflicts(events: list[Event], busy: list[Busy]) -> list[Conflict]:
+    conflicts: list[Conflict] = []
+    for idx, ev in enumerate(events):
+        iv = _event_interval(ev)
+        if iv is None:
+            continue
+        a0, a1 = iv
+        for b in busy:
+            sev = _severity(a0, a1, b.start, b.end)
+            if sev:
+                conflicts.append(
+                    Conflict(event_index=idx, clashes_with=b.title or "existing event", severity=sev)
+                )
+    return conflicts
+def propose_times(ev: Event, busy: list[Busy], n: int = 3) -> list[str]:
+    """Suggest up to n nearby start times that don't overlap busy intervals."""
+    iv = _event_interval(ev)
+    if iv is None:
+        return []
+    start, end = iv
+    duration = end - start
+    out: list[str] = []
+    # try later today (+1h..+4h), then same time the next two days
+    candidates = [start + timedelta(hours=h) for h in (1, 2, 3, 4)]
+    candidates += [start + timedelta(days=d) for d in (1, 2)]
+    for c in candidates:
+        if not any(_overlaps(c, c + duration, b.start, b.end) for b in busy):
+            out.append(c.isoformat())
+        if len(out) >= n:
+            break
+    return out
+def annotate_conflicts(plan: ActionPlan, busy: list[Busy]) -> ActionPlan:
+    """Replace model-guessed conflicts with deterministic ones + propose times."""
+    if not busy:
+        return plan
+    plan.conflicts = check_conflicts(plan.events, busy)
+    events_bus.emit(
+        "conflict",
+        f"{len(plan.conflicts)} conflict(s) vs {len(busy)} existing event(s)",
+        conflicts=len(plan.conflicts),
+    )
+    clashing_idx = {c.event_index for c in plan.conflicts}
+    proposals: list[str] = []
+    for idx in sorted(clashing_idx):
+        proposals.extend(propose_times(plan.events[idx], busy))
+    # de-dupe preserving order
+    seen = set()
+    plan.proposed_times = [t for t in proposals if not (t in seen or seen.add(t))]
+    return plan

calendar_out/gcal.py ADDED Viewed

	@@ -0,0 +1,313 @@

+"""OPTIONAL Google Calendar push (the one optional cloud touchpoint).
+Disabled unless the user opts in via the UI toggle and provides OAuth creds.
+Keeps the default .ics path strictly off-grid.
+"""
+from __future__ import annotations
+import json
+import os
+import threading
+import time
+from pathlib import Path
+from dateutil import parser as dtparser
+from server import events as events_bus
+from server.schema import Event
+SCOPES = ["https://www.googleapis.com/auth/calendar.events"]
+GOOGLE_TOKEN_URI = "https://oauth2.googleapis.com/token"
+# --------------------------------------------------------------------------- #
+# Per-user OAuth (web flow): each visitor connects their OWN Google account.
+# The OAuth *app* creds (client id/secret) are the owner's, set as Space secrets;
+# the resulting per-user token is held client-side (never stored server-side) and
+# passed back only to perform a push. See app.py /oauth2/start + /oauth2callback.
+# --------------------------------------------------------------------------- #
+def _client_config() -> dict:
+    """OAuth client config from env (Space secrets). Raises if unconfigured."""
+    cid = os.environ.get("GOOGLE_OAUTH_CLIENT_ID", "").strip()
+    csecret = os.environ.get("GOOGLE_OAUTH_CLIENT_SECRET", "").strip()
+    if not (cid and csecret):
+        raise RuntimeError(
+            "Google Calendar isn't configured: set GOOGLE_OAUTH_CLIENT_ID and "
+            "GOOGLE_OAUTH_CLIENT_SECRET (a Google Cloud OAuth 'Web application' client)."
+        )
+    return {"web": {
+        "client_id": cid,
+        "client_secret": csecret,
+        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
+        "token_uri": "https://oauth2.googleapis.com/token",
+    }}
+# PKCE: authorization_url() auto-generates a code_verifier (google-auth-oauthlib
+# >= 1.0) and sends its challenge to Google; the token exchange must then send
+# the SAME verifier or Google rejects it with "(invalid_grant) Missing code
+# verifier". The start and the callback are different HTTP requests, so the
+# verifier is held server-side for a few minutes, keyed by the flow's `state`
+# — which doubles as the CSRF check. Single-use; nothing user-identifying.
+_PENDING_TTL_S = 600
+_PENDING_MAX = 500  # bound memory if /oauth2/start is hammered
+_pending: dict[str, tuple[str, float]] = {}
+_pending_lock = threading.Lock()
+def _remember_verifier(state: str, verifier: str) -> None:
+    now = time.time()
+    with _pending_lock:
+        for k in [k for k, (_, t) in _pending.items() if now - t > _PENDING_TTL_S]:
+            _pending.pop(k, None)
+        while len(_pending) >= _PENDING_MAX:
+            _pending.pop(next(iter(_pending)))
+        _pending[state] = (verifier, now)
+def _pop_verifier(state: str) -> str | None:
+    with _pending_lock:
+        item = _pending.pop(state or "", None)
+    if item is None:
+        return None
+    verifier, t = item
+    return verifier if time.time() - t <= _PENDING_TTL_S else None
+def auth_url(redirect_uri: str) -> tuple[str, str]:
+    """Build the Google consent URL for the calendar-events scope. Returns (url, state)."""
+    from google_auth_oauthlib.flow import Flow
+    flow = Flow.from_client_config(_client_config(), scopes=SCOPES, redirect_uri=redirect_uri)
+    url, state = flow.authorization_url(
+        access_type="offline", include_granted_scopes="true", prompt="consent"
+    )
+    _remember_verifier(state, flow.code_verifier)
+    return url, state
+def exchange_code(redirect_uri: str, code: str, state: str = "") -> str:
+    """Exchange an auth code for a per-user token; returns the token as a JSON string.
+    ``state`` must match a pending auth_url() call — it keys the PKCE verifier
+    and doubles as the CSRF check."""
+    verifier = _pop_verifier(state)
+    if verifier is None:
+        raise RuntimeError(
+            "sign-in session expired or unknown — close this window and click "
+            "Connect Google Calendar again"
+        )
+    from google_auth_oauthlib.flow import Flow
+    flow = Flow.from_client_config(
+        _client_config(), scopes=SCOPES, redirect_uri=redirect_uri, code_verifier=verifier
+    )
+    flow.fetch_token(code=code)
+    return _sanitize_token_json(flow.credentials.to_json())
+def _sanitize_token_json(token_json: str) -> str:
+    """Token JSON as handed to the BROWSER (localStorage): the OAuth app's
+    client_secret has no business there — the server re-injects it from env
+    when it needs to refresh."""
+    info = json.loads(token_json)
+    info.pop("client_secret", None)
+    return json.dumps(info)
+def _with_client_secret(info: dict) -> dict:
+    """Restore the env client_secret into browser-held token info so
+    creds.refresh() works. Older stored tokens that still carry a secret are
+    left untouched. The refresh endpoint is PINNED: the token JSON comes from
+    the browser, and a crafted token_uri would otherwise receive the injected
+    secret on refresh."""
+    info = {**info, "token_uri": GOOGLE_TOKEN_URI}
+    if not info.get("client_secret"):
+        secret = os.environ.get("GOOGLE_OAUTH_CLIENT_SECRET", "").strip()
+        if secret:
+            info["client_secret"] = secret
+    return info
+def _creds_from_token_json(token_json: str):
+    from google.oauth2.credentials import Credentials
+    return Credentials.from_authorized_user_info(
+        _with_client_secret(json.loads(token_json)), SCOPES
+    )
+def _refresh_if_needed(creds) -> str | None:
+    """Refresh expired creds; returns sanitized token JSON to re-store client-side,
+    or None when no refresh happened."""
+    from google.auth.transport.requests import Request
+    if not creds.valid and creds.expired and creds.refresh_token:
+        creds.refresh(Request())
+        return _sanitize_token_json(creds.to_json())
+    return None
+def _probe_events(creds) -> None:
+    """Cheapest real API call permitted by the calendar.events scope (the only
+    scope we request — calendarList/freeBusy would 403)."""
+    from googleapiclient.discovery import build
+    build("calendar", "v3", credentials=creds).events().list(
+        calendarId="primary", maxResults=1, fields="items(id)"
+    ).execute()
+def _is_definitive_auth_failure(e: Exception) -> bool:
+    """True when the token itself is dead (revoked/invalid), False for anything
+    that might heal on its own. Duck-typed by exception name / resp.status so
+    this module never has to import the google libs (absent in CI)."""
+    if type(e).__name__ == "RefreshError":  # revoked / invalid_grant
+        return True
+    status = getattr(getattr(e, "resp", None), "status", None)  # HttpError
+    return status in (401, 403)
+def check_token(token_json: str) -> dict:
+    """Liveness check for a browser-held token: refresh if needed, then one
+    real (scope-compatible) API call. Three-state result so the client only
+    discards a token on a DEFINITIVE failure:
+      {"ok": True, "refreshed_token": <sanitized json> | None}
+      {"ok": False, "reason": str, "transient": bool}
+    """
+    try:
+        info = json.loads(token_json or "")
+        if not isinstance(info, dict) or not (info.get("refresh_token") or info.get("token")):
+            raise ValueError("token JSON missing token/refresh_token")
+    except Exception as e:  # noqa: BLE001  garbage in localStorage -> definitive
+        return {"ok": False, "reason": f"unreadable token: {e}", "transient": False}
+    try:
+        creds = _creds_from_token_json(token_json)
+        refreshed = _refresh_if_needed(creds)
+        _probe_events(creds)
+        return {"ok": True, "refreshed_token": refreshed}
+    except ImportError as e:
+        return {"ok": False, "reason": f"google libs unavailable: {e}", "transient": True}
+    except Exception as e:  # noqa: BLE001
+        return {
+            "ok": False,
+            "reason": f"{type(e).__name__}: {e}",
+            "transient": not _is_definitive_auth_failure(e),
+        }
+def _dt_field(value: str) -> dict:
+    """Calendar API datetime field. The model emits offset-less ISO datetimes
+    (schema: 2026-06-10T13:00:00) and the API 400s on a naive dateTime without
+    a timeZone. Uses the shared zone basis (calendar_out/tzconfig — same one
+    freebusy compares conflicts in); with no configured zone, naive datetimes
+    get the process-local offset attached instead."""
+    from calendar_out.tzconfig import configured_timezone
+    dt = dtparser.isoparse(value)
+    if dt.tzinfo is not None:  # already has an offset — API accepts as-is
+        return {"dateTime": dt.isoformat()}
+    tz = configured_timezone()
+    if tz:
+        return {"dateTime": dt.isoformat(), "timeZone": tz}
+    # interpret as process-local: attach the local offset
+    return {"dateTime": dt.astimezone().isoformat()}
+def _event_body(ev: Event) -> dict:
+    body = {
+        "summary": ev.title,
+        "start": _dt_field(ev.start),
+        "end": _dt_field(ev.end or ev.start),
+    }
+    if ev.location:
+        body["location"] = ev.location
+    if ev.notes:
+        body["description"] = ev.notes
+    if ev.reminder_minutes is not None:
+        body["reminders"] = {
+            "useDefault": False,
+            "overrides": [{"method": "popup", "minutes": ev.reminder_minutes}],
+        }
+    return body
+def push_events_with_token(token_json: str, events: list[Event], calendar_id: str = "primary") -> list[str]:
+    """Push events to the *visitor's* calendar using their per-session OAuth token."""
+    from google.auth.transport.requests import Request
+    from googleapiclient.discovery import build
+    creds = _creds_from_token_json(token_json)
+    if not creds.valid and creds.expired and creds.refresh_token:
+        creds.refresh(Request())
+    svc = build("calendar", "v3", credentials=creds)
+    links = []
+    for ev in events:
+        created = svc.events().insert(calendarId=calendar_id, body=_event_body(ev)).execute()
+        links.append(created.get("htmlLink", ""))
+    events_bus.emit("calendar", f"pushed {len(links)} event(s) to Google Calendar")
+    return links
+def _service():
+    """Build an authorized Calendar service. Requires credentials.json + token.json.
+    Run an OAuth flow once locally to mint token.json; do NOT commit either file.
+    """
+    from google.auth.transport.requests import Request
+    from google.oauth2.credentials import Credentials
+    from google_auth_oauthlib.flow import InstalledAppFlow
+    from googleapiclient.discovery import build
+    creds = None
+    if os.path.exists("token.json"):
+        creds = Credentials.from_authorized_user_file("token.json", SCOPES)
+    if not creds or not creds.valid:
+        if creds and creds.expired and creds.refresh_token:
+            creds.refresh(Request())
+        else:
+            flow = InstalledAppFlow.from_client_secrets_file("credentials.json", SCOPES)
+            creds = flow.run_local_server(port=0)
+        with open("token.json", "w") as f:
+            f.write(creds.to_json())
+    return build("calendar", "v3", credentials=creds)
+def push_events(events: list[Event], calendar_id: str = "primary") -> list[str]:
+    """Create events in Google Calendar; returns created event links."""
+    svc = _service()
+    links = []
+    for ev in events:
+        created = svc.events().insert(calendarId=calendar_id, body=_event_body(ev)).execute()
+        links.append(created.get("htmlLink", ""))
+    events_bus.emit("calendar", f"pushed {len(links)} event(s) to Google Calendar")
+    return links
+def read_recent_facts(calendar_id: str = "primary", max_results: int = 50) -> tuple[list[str], list[str]]:
+    """OPT-IN read: scan recent/upcoming events for recurring attendees and
+    locations to seed memory. Returns (contact_names, locations). Raises if
+    Google libs/creds aren't configured (the caller degrades gracefully)."""
+    from collections import Counter
+    svc = _service()
+    items = (
+        svc.events()
+        .list(calendarId=calendar_id, maxResults=max_results, singleEvents=True, orderBy="startTime")
+        .execute()
+        .get("items", [])
+    )
+    people, places = Counter(), Counter()
+    for ev in items:
+        for a in ev.get("attendees", []) or []:
+            nm = (a.get("displayName") or "").strip()
+            if nm and len(nm) <= 60:
+                people[nm] += 1
+        loc = (ev.get("location") or "").strip()
+        if loc and len(loc) <= 80:
+            places[loc] += 1
+    # keep recurring ones (seen >= 2) so memory stays meaningful
+    names = [n for n, c in people.most_common(20) if c >= 2] or [n for n, _ in people.most_common(10)]
+    locs = [p for p, c in places.most_common(10) if c >= 2]
+    return names, locs

calendar_out/ics.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""Generate .ics files locally (default, off-grid output)."""
+from __future__ import annotations
+import os
+import tempfile
+from datetime import datetime
+from dateutil import parser as dtparser
+from icalendar import Alarm, Calendar
+from icalendar import Event as IcsEvent
+from server import events as events_bus
+from server.schema import Event
+def events_to_ics(events: list[Event]) -> bytes:
+    cal = Calendar()
+    cal.add("prodid", "-//iMessage Calendar Agent//EN")
+    cal.add("version", "2.0")
+    for ev in events:
+        ie = IcsEvent()
+        ie.add("summary", ev.title)
+        ie.add("dtstart", dtparser.isoparse(ev.start))
+        if ev.end:
+            ie.add("dtend", dtparser.isoparse(ev.end))
+        if ev.location:
+            ie.add("location", ev.location)
+        if ev.notes:
+            ie.add("description", ev.notes)
+        if ev.attendees:
+            for a in ev.attendees:
+                ie.add("attendee", a)
+        if ev.reminder_minutes is not None:
+            alarm = Alarm()
+            alarm.add("action", "DISPLAY")
+            alarm.add("description", f"Reminder: {ev.title}")
+            alarm.add("trigger", _minutes_before(ev.reminder_minutes))
+            ie.add_component(alarm)
+        cal.add_component(ie)
+    return cal.to_ical()
+def _minutes_before(minutes: int):
+    from datetime import timedelta
+    return timedelta(minutes=-minutes)
+def write_ics(events: list[Event], path: str | None = None) -> str:
+    """Write events to an .ics file and return the path (for Gradio download)."""
+    data = events_to_ics(events)
+    if path is None:
+        fd, path = tempfile.mkstemp(suffix=".ics", prefix="events_")
+        os.close(fd)
+    with open(path, "wb") as f:
+        f.write(data)
+    events_bus.emit("calendar", f"wrote .ics with {len(events)} event(s)")
+    return path

calendar_out/tzconfig.py ADDED Viewed

	@@ -0,0 +1,46 @@

+"""One timezone basis for the whole calendar path.
+The model emits offset-less local datetimes; gcal labels them with a zone and
+freebusy compares them against .ics busy intervals. Both MUST resolve the zone
+the same way, or conflicts are checked in one zone while events are pushed in
+another (off by the whole UTC offset). Resolution order:
+1. CAL_TIMEZONE / TZ env — any value that validates as an IANA zone, slash or
+   not (UTC, GMT, Japan, America/New_York all count; a leading ':' is stripped).
+2. /etc/timezone (Debian-style containers — i.e. the Space image).
+3. None -> the process-local zone.
+"""
+from __future__ import annotations
+import os
+from pathlib import Path
+from typing import Optional
+from zoneinfo import ZoneInfo
+def _valid(name: str) -> Optional[str]:
+    name = (name or "").strip().lstrip(":")
+    if not name:
+        return None
+    try:
+        ZoneInfo(name)
+        return name
+    except Exception:  # noqa: BLE001  not an IANA name (e.g. TZ=EST5EDT rules)
+        return None
+def configured_timezone() -> Optional[str]:
+    """The configured IANA zone name, or None meaning 'process-local'."""
+    for env in ("CAL_TIMEZONE", "TZ"):
+        v = _valid(os.environ.get(env, ""))
+        if v:
+            return v
+    try:
+        return _valid(Path("/etc/timezone").read_text())
+    except Exception:  # noqa: BLE001  not Debian-style (macOS/Windows)
+        return None
+def zone() -> Optional[ZoneInfo]:
+    name = configured_timezone()
+    return ZoneInfo(name) if name else None

collector/.env.example ADDED Viewed

	@@ -0,0 +1,13 @@

+# Copy to .env and fill in. NEVER commit the real .env.
+SPACE_URL=https://your-space.hf.space
+INGEST_TOKEN=change-me-to-match-the-space
+POLL_SECONDS=20
+# Optional: comma-separated chat names/handles to watch (blank = all)
+WATCH_CHATS=
+# Path to the iMessage DB (default is correct on macOS)
+CHAT_DB=~/Library/Messages/chat.db
+# AGENT_MODE=1 posts to /agent (client-side autonomous push) instead of /ingest.
+# Prefer the server-side switch (run the backend with AUTONOMOUS=1) so logic lives
+# in one place; the collector now also reports is_from_me so the backend can fire
+# only on YOUR sent/accepted messages (backend env TRIGGER_ON=outgoing, the default).
+AGENT_MODE=0

collector/collector.py ADDED Viewed

	@@ -0,0 +1,175 @@

+"""Mac-side iMessage collector.
+Polls ~/Library/Messages/chat.db for new messages and POSTs them to the Space
+/ingest endpoint. Requires Full Disk Access for the running process.
+This reads the `text` column directly for simplicity. Many modern messages store
+their body in `attributedBody` (an NSAttributedString blob) instead — for robust
+extraction, prefer the `imessage-exporter` CLI (ReagentX) or the `imessage_reader`
+package rather than expanding the SQL here.
+"""
+from __future__ import annotations
+import os
+import sqlite3
+import sys
+import time
+from pathlib import Path
+import requests
+from dotenv import load_dotenv
+# Allow importing the shared image helper from the repo root.
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from server.imageutil import to_data_uri  # noqa: E402
+load_dotenv()
+SPACE_URL = os.environ["SPACE_URL"].rstrip("/")
+INGEST_TOKEN = os.environ["INGEST_TOKEN"]
+POLL_SECONDS = int(os.environ.get("POLL_SECONDS", "20"))
+# AGENT_MODE=1: call /agent (run the agent + push to calendar) instead of /ingest
+# (which only stores for review). The autonomous, hands-off path — see docs/automations.md.
+AGENT_MODE = os.environ.get("AGENT_MODE") == "1"
+CHAT_DB = Path(os.path.expanduser(os.environ.get("CHAT_DB", "~/Library/Messages/chat.db")))
+WATCH = [c.strip() for c in os.environ.get("WATCH_CHATS", "").split(",") if c.strip()]
+STATE = Path(__file__).with_name(".last_rowid")
+# Apple epoch = 2001-01-01; timestamps are nanoseconds since then.
+# Chat key: display_name is EMPTY for 1:1 chats, and your own outgoing rows
+# have handle_id NULL — falling back to sender filed incoming messages under
+# the phone number and your replies under "unknown", so rolling_thread never
+# assembled the conversation. COALESCE to chat_identifier gives both
+# directions of a direct chat one stable key.
+QUERY = """
+SELECT m.ROWID, m.text, m.attributedBody, h.id AS sender,
+       COALESCE(NULLIF(c.display_name, ''), c.chat_identifier) AS chat,
+       m.is_from_me,
+       datetime(m.date/1000000000 + 978307200, 'unixepoch', 'localtime') AS ts
+FROM message m
+LEFT JOIN handle h ON m.handle_id = h.ROWID
+LEFT JOIN chat_message_join cmj ON cmj.message_id = m.ROWID
+LEFT JOIN chat c ON c.ROWID = cmj.chat_id
+WHERE m.ROWID > ?
+ORDER BY m.ROWID ASC
+"""
+# Image attachments for a given message ROWID (filenames live under Attachments/).
+ATTACH_QUERY = """
+SELECT a.filename
+FROM attachment a
+JOIN message_attachment_join maj ON maj.attachment_id = a.ROWID
+WHERE maj.message_id = ?
+"""
+def _attachments_for(conn: sqlite3.Connection, message_rowid: int) -> list[str]:
+    """Return base64 data URIs for image attachments of a message."""
+    uris: list[str] = []
+    for (filename,) in conn.execute(ATTACH_QUERY, (message_rowid,)).fetchall():
+        if not filename:
+            continue
+        path = os.path.expanduser(filename)
+        uri = to_data_uri(path)  # None for non-images / too large
+        if uri:
+            uris.append(uri)
+    return uris
+def _last_rowid() -> int:
+    try:
+        return int(STATE.read_text().strip())
+    except Exception:  # noqa: BLE001
+        return 0
+def _save_rowid(rowid: int) -> None:
+    STATE.write_text(str(rowid))
+def poll_once(conn: sqlite3.Connection) -> int:
+    last = _last_rowid()
+    rows = conn.execute(QUERY, (last,)).fetchall()
+    batch = []
+    max_rowid = last
+    for rowid, text, _attr, sender, chat, is_from_me, ts in rows:
+        max_rowid = max(max_rowid, rowid)
+        if WATCH and (chat or "") not in WATCH:
+            continue
+        images = _attachments_for(conn, rowid)
+        if not text and not images:
+            continue  # nothing usable (see docstring re: attributedBody-only msgs)
+        batch.append(
+            {
+                "chat": chat or (sender or "unknown"),
+                "sender": "me" if is_from_me else (sender or "unknown"),
+                "text": text or "",
+                "timestamp": ts,
+                "images": images,
+                "is_from_me": bool(is_from_me),  # you sending/accepting = the trigger
+            }
+        )
+    if batch:
+        headers = {"Authorization": f"Bearer {INGEST_TOKEN}"}
+        if AGENT_MODE:
+            # One /agent call PER CHAT — a raw batch can span conversations,
+            # and format_thread would interleave them into one bogus thread.
+            # (/ingest doesn't need this: the server groups by chat itself.)
+            # Per-chat failures are caught, not raised: the /agent path pushes
+            # to the calendar with no dedup, so aborting mid-loop and replaying
+            # the whole batch next poll would re-push the chats that already
+            # succeeded. At-most-once: a failed chat's batch is logged and
+            # dropped; its next message re-triggers the rolling window anyway.
+            by_chat: dict[str, list[dict]] = {}
+            for m in batch:
+                by_chat.setdefault(m["chat"], []).append(m)
+            for chat, msgs in by_chat.items():
+                try:
+                    resp = requests.post(
+                        f"{SPACE_URL}/agent",
+                        json={"messages": msgs, "push_gcal": True},
+                        headers=headers,
+                        timeout=120,
+                    )
+                    resp.raise_for_status()
+                    plan = resp.json().get("plan", {})
+                    print(f"[{chat}] sent {len(msgs)} msg(s) -> "
+                          f"{len(plan.get('events', []))} event(s)")
+                except Exception as e:  # noqa: BLE001
+                    print(f"[{chat}] agent call failed ({e}) — skipping this "
+                          "batch for the chat; next message re-triggers it")
+        else:
+            resp = requests.post(
+                f"{SPACE_URL}/ingest",
+                json={"messages": batch},
+                headers=headers,
+                timeout=30,
+            )
+            resp.raise_for_status()
+            print(f"sent {len(batch)} message(s) -> {resp.json()}")
+    if max_rowid > last:
+        _save_rowid(max_rowid)
+    return len(batch)
+def main():
+    if not CHAT_DB.exists():
+        raise SystemExit(f"chat.db not found at {CHAT_DB} (grant Full Disk Access?)")
+    # Read-only connection so we never mutate the Messages DB.
+    conn = sqlite3.connect(f"file:{CHAT_DB}?mode=ro", uri=True)
+    print(f"polling {CHAT_DB} every {POLL_SECONDS}s -> {SPACE_URL}/ingest")
+    try:
+        while True:
+            try:
+                poll_once(conn)
+            except Exception as e:  # noqa: BLE001 - keep the loop alive
+                print(f"poll error: {e}")
+            time.sleep(POLL_SECONDS)
+    finally:
+        conn.close()
+if __name__ == "__main__":
+    main()

deploy/launchd/com.offgrid.backend.plist ADDED Viewed

	@@ -0,0 +1,33 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Backend: serves the Gradio UI + /agent + /ingest, runs the agent autonomously,
+     and uses Hermes via INFERENCE_BASE_URL. Template — see scripts/setup_mac.sh. -->
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+  <key>Label</key><string>com.offgrid.backend</string>
+  <key>ProgramArguments</key>
+  <array>
+    <string>__PYTHON__</string>
+    <string>__REPO__/app.py</string>
+  </array>
+  <key>WorkingDirectory</key><string>__REPO__</string>
+  <key>EnvironmentVariables</key>
+  <dict>
+    <key>AUTONOMOUS</key><string>1</string>
+    <key>TRIGGER_ON</key><string>outgoing</string>
+    <key>USE_STUB_EXTRACTOR</key><string>0</string>
+    <key>INFERENCE_BASE_URL</key><string>http://127.0.0.1:8080/v1</string>
+    <key>INFERENCE_MODEL</key><string>hermes</string>
+    <key>INGEST_TOKEN</key><string>__INGEST_TOKEN__</string>
+    <key>MEMORY_PATH</key><string>__HOME__/.offgrid/agent_memory.json</string>
+    <key>FEED_PATH</key><string>__HOME__/.offgrid/ingest_feed.json</string>
+    <key>DEDUP_PATH</key><string>__HOME__/.offgrid/agent_seen.json</string>
+    <key>IMPACT_PATH</key><string>__HOME__/.offgrid/impact_weeks.json</string>
+    <key>PORT</key><string>7860</string>
+  </dict>
+  <key>RunAtLoad</key><true/>
+  <key>KeepAlive</key><true/>
+  <key>StandardOutPath</key><string>__HOME__/Library/Logs/offgrid-backend.log</string>
+  <key>StandardErrorPath</key><string>__HOME__/Library/Logs/offgrid-backend.log</string>
+</dict>
+</plist>

deploy/launchd/com.offgrid.collector.plist ADDED Viewed

	@@ -0,0 +1,28 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Collector: polls ~/Library/Messages/chat.db and POSTs new messages to the
+     local backend's /ingest. NEEDS Full Disk Access for the program below
+     (System Settings > Privacy & Security > Full Disk Access -> add the python
+     binary __PYTHON__). Template — see scripts/setup_mac.sh. -->
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+  <key>Label</key><string>com.offgrid.collector</string>
+  <key>ProgramArguments</key>
+  <array>
+    <string>__PYTHON__</string>
+    <string>__REPO__/collector/collector.py</string>
+  </array>
+  <key>WorkingDirectory</key><string>__REPO__/collector</string>
+  <key>EnvironmentVariables</key>
+  <dict>
+    <key>SPACE_URL</key><string>http://127.0.0.1:7860</string>
+    <key>INGEST_TOKEN</key><string>__INGEST_TOKEN__</string>
+    <key>POLL_SECONDS</key><string>20</string>
+    <key>CHAT_DB</key><string>__HOME__/Library/Messages/chat.db</string>
+  </dict>
+  <key>RunAtLoad</key><true/>
+  <key>KeepAlive</key><true/>
+  <key>StandardOutPath</key><string>__HOME__/Library/Logs/offgrid-collector.log</string>
+  <key>StandardErrorPath</key><string>__HOME__/Library/Logs/offgrid-collector.log</string>
+</dict>
+</plist>

deploy/launchd/com.offgrid.hermes.plist ADDED Viewed

	@@ -0,0 +1,23 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Hermes brain: an OpenAI-compatible llama.cpp server the backend points at via
+     INFERENCE_BASE_URL. Template — scripts/setup_mac.sh fills the __PLACEHOLDERS__
+     and installs into ~/Library/LaunchAgents. -->
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+  <key>Label</key><string>com.offgrid.hermes</string>
+  <key>ProgramArguments</key>
+  <array>
+    <string>__LLAMA_SERVER__</string>
+    <string>-m</string><string>__MODEL_GGUF__</string>
+    <string>--host</string><string>127.0.0.1</string>
+    <string>--port</string><string>8080</string>
+    <string>--ctx-size</string><string>8192</string>
+    <string>--jinja</string>          <!-- enable the tool-calling chat template -->
+  </array>
+  <key>RunAtLoad</key><true/>
+  <key>KeepAlive</key><true/>
+  <key>StandardOutPath</key><string>__HOME__/Library/Logs/offgrid-hermes.log</string>
+  <key>StandardErrorPath</key><string>__HOME__/Library/Logs/offgrid-hermes.log</string>
+</dict>
+</plist>

docs/android-tasker.md ADDED Viewed

	@@ -0,0 +1,37 @@

+# Android background capture (Scenario 2)
+Unlike iOS, Android **allows background message capture**, so you get real on-phone autonomy — just
+not for iMessage. A no-build recipe (Tasker or MacroDroid) calls the same shared **`POST /agent`**
+backend the Mac collector and iOS Shortcut use.
+## What you need
+- The backend reachable from the phone: the HF Space's dedicated-GPU path, a Mac/cloud box, or even
+  the phone itself (Termux). The free **ZeroGPU Space does not serve `/agent`** (Gradio-SDK only) — use
+  one of the others.
+- The same `INGEST_TOKEN` the backend uses.
+## Tasker recipe (Notification Access — works for RCS/WhatsApp/SMS notifications)
+1. **Profile → Event → UI → Notification** (or **Phone → Received Text** for SMS). Restrict it to
+   your messaging app(s).
+2. **Task → Net → HTTP Request:**
+   - Method: `POST`
+   - URL: `https://<your-backend>/agent`
+   - Headers: `Authorization: Bearer <INGEST_TOKEN>` and `Content-Type: application/json`
+   - Body:
+     ```json
+     { "thread": "%evtprm()", "now": "%TIMES", "push_gcal": true }
+     ```
+     (Use the notification text variable your trigger provides for `thread`; `%TIMES` → current time.)
+3. **Parse the response** (`Variable → JSON Read` on `plan.events`) if you want a confirmation
+   toast/notification; otherwise `push_gcal:true` already created the events in Google Calendar.
+MacroDroid is equivalent: **Trigger:** Notification Received / SMS Received → **Action:** HTTP POST
+with the same URL/headers/body.
+## Notes
+- This is genuinely hands-off: the OS delivers the trigger in the background.
+- For a fully on-device variant, run the backend + a small model in **Termux** and point Tasker at
+  `http://127.0.0.1:7860/agent`, with `INFERENCE_BASE_URL` → a local `llama-server` (Gemma E4B / a
+  small Hermes). See [on-device.md](./on-device.md) and [hermes.md](./hermes.md).
+- A native Kotlin `NotificationListenerService` app could replace Tasker for a polished install — a
+  separate effort; the Tasker recipe is the MVP.

docs/architecture.md ADDED Viewed

	@@ -0,0 +1,121 @@

+# Architecture — workflows and the LLMs behind them
+An AI-solution-architect view of the agentic system: every workflow through the
+platform, and exactly which model (if any) each one calls. The architectural
+signature: the extraction core is **one grammar-constrained LLM call**, the
+**MiniCPM planner** adds a visible multi-step loop over the platform's own
+public MCP tool contract, everything verifiable — conflict math, dedup, time
+proposals, eval gates — stays deterministic, and there are **zero cloud-AI API
+calls anywhere**, training included.
+## System workflow
+```mermaid
+flowchart TB
+    subgraph ENTRY["1 · Entry points — four front-ends, one contract"]
+        direction LR
+        UIIN["🖥️ Gradio UI<br/>Schedule flow + Agent tab<br/>(paste thread, screenshots, .ics)"]
+        SHORT["📱 iOS Shortcut /<br/>Android Tasker"]
+        MAC["🍎 Mac collector<br/>polls iMessage chat.db<br/>(collector/collector.py)"]
+        MCPC["🤖 MCP clients<br/>Claude Desktop, Cursor"]
+    end
+    subgraph API["2 · API & orchestration — app.py (FastAPI + Gradio, one port)"]
+        AGENTEP["POST /agent<br/>bearer-token, stateless"]
+        INGEST["POST /ingest → feed store<br/>AUTONOMOUS=1 triggers on<br/>your outgoing message (is_from_me)"]
+        ROLL["threads.rolling_thread<br/>per-chat window (20 msgs / 12 h)"]
+        MCPT["MCP tools — server/mcp_tools.py<br/>extract_events · make_ics · check_conflicts"]
+    end
+    subgraph ORCH["2a · Agentic orchestration — server/orchestrator.py"]
+        SMOL["smolagents ToolCallingAgent<br/>planned by MiniCPM, ≤6 steps<br/>playbook: extract → check → render<br/>final ActionPlan re-derived deterministically"]
+        SCRIPT["ScriptedPlanner — no LLM<br/>identical tool sequence + step events<br/>(stub mode, CI, planner failure)"]
+    end
+    subgraph CORE["3 · Agent core — server/pipeline.py → server/agent.py"]
+        PROMPT["Prompt assembly:<br/>SYSTEM + memory recall block<br/>+ existing calendar + thread + images"]
+        GEN["Grammar-constrained generation<br/>→ ActionPlan JSON (always parses)"]
+        PROMPT --> GEN
+    end
+    subgraph LLMT["4 · LLM tier — ALL inference is local llama.cpp, zero cloud AI APIs"]
+        GEMMA["⭐ gemma-cal E4B — fine-tuned Gemma 4<br/>ParetoOptimal/gemma-4-cal-gguf<br/>gemma-cal-e4b-Q4_K_M.gguf (~5 GB)<br/>+ mmproj-F16.gguf vision projector"]
+        MODES["served either:<br/>· in-process llama-cpp-python (ZeroGPU lease)<br/>· remote llama-server via INFERENCE_BASE_URL<br/>(Space sidecar / Mac launchd / phone)"]
+        MINICPM["🧭 MiniCPM planner — OpenBMB (sponsor)<br/>openbmb/MiniCPM4.1-8B-GGUF Q4 (~5 GB)<br/>≤4B option: openbmb/MiniCPM5-1B-GGUF (config switch)<br/>2nd llama-server :8081 — enabled via<br/>PLANNER_HF_REPO / PLANNER_FILE"]
+        HERMES["(optional) Hermes-3-Llama-3.1-8B Q4_K_M<br/>HERMES_TOOLS=1 — tool-calling loop:<br/>calls remember() to write memory mid-run"]
+        STUB["(no LLM) regex stub extractor<br/>USE_STUB_EXTRACTOR=1 — CI & free tier"]
+        GEMMA --- MODES
+    end
+    subgraph DET["5 · Deterministic post-processing — no LLM"]
+        CONF["freebusy.annotate_conflicts<br/>overlap / adjacent / tight<br/>+ propose_times free slots"]
+        DEDUP["dedup.filter_new<br/>idempotency for autonomous runs"]
+        MEMW["memory.observe_plan<br/>learns recurring contacts"]
+    end
+    subgraph OUT["6 · Outputs"]
+        CARDS["Event cards + reply draft<br/>+ clarification question"]
+        ICS["📥 .ics download<br/>(off-grid default)"]
+        GCAL["📆 Google Calendar push<br/>(per-user OAuth web flow, opt-in)"]
+        TRACE["Redacted trace export<br/>→ public HF dataset"]
+    end
+    UIIN -->|"run_orchestrator (step trace streams into the UI)"| SMOL
+    SHORT --> AGENTEP
+    MAC -->|"store-only"| INGEST
+    MAC -->|"AGENT_MODE=1"| AGENTEP
+    MCPC --> MCPT
+    AGENTEP --> CORE
+    INGEST --> ROLL --> CORE
+    SMOL ==>|"planning loop, ≤6 steps"| MINICPM
+    SMOL -->|"tool calls — the Space's OWN MCP<br/>endpoint (localhost SSE)"| MCPT
+    SMOL -.->|"planner down / stub mode"| SCRIPT
+    SCRIPT -->|"same tool sequence,<br/>deterministic"| MCPT
+    MCPT -->|"extract_events → 1 LLM call"| CORE
+    MCPT -.->|"make_ics / check_conflicts → 0 LLM calls"| DET
+    GEN ==>|"default"| GEMMA
+    GEN -.->|"opt-in autonomous brain"| HERMES
+    GEN -.->|"tests / free demo"| STUB
+    HERMES -->|"remember()"| MEMW
+    LLMT --> DET --> OUT
+```
+## Offline loop — eval-gated fine-tuning (produces the serving LLM)
+```mermaid
+flowchart LR
+    SEEDS["Seed data — NO LLM<br/>139 hand-authored template examples<br/>(gen_new_seeds.py / make_dataset.py)"]
+    SMC["SMCalFlow import — NO LLM<br/>deterministic LISP-program parse, ~2000 rows"]
+    TRAIN["QLoRA fine-tune — Unsloth on Modal A100-80GB<br/>base: google/gemma-4-31B-it or gemma-4-E4B-it<br/>r=16, lr 5e-5, 2 epochs, responses-only loss"]
+    GGUF["convert_hf_to_gguf + llama-quantize<br/>→ staging Q4_K_M GGUF"]
+    EVAL["Eval — NO LLM judge, deterministic metrics<br/>60-example held-out set:<br/>schema validity · event F1 · start-exact recall"]
+    GATE{"Gate<br/>validity ≥ 0.95<br/>F1 ≥ 0.81<br/>recall ≥ 0.773"}
+    PROD["Promote → ParetoOptimal/gemma-4-cal-gguf<br/>(the model the Space serves)"]
+    TRASH["Discard staging —<br/>production untouched"]
+    SEEDS --> TRAIN
+    SMC --> TRAIN
+    TRAIN --> GGUF --> EVAL --> GATE
+    GATE -->|pass| PROD
+    GATE -->|fail| TRASH
+```
+See [eval-roadmap.md](./eval-roadmap.md) and the
+[eval-gated fine-tuning post-mortem](./blog-eval-gated-finetuning.md) for the
+gate's history and rationale; [hermes.md](./hermes.md) for the optional
+tool-calling backend; [build-small-submission.md](./build-small-submission.md)
+for how the MiniCPM planner maps to the `sponsor:openbmb` track.
+## Which LLM each workflow calls
+| # | Workflow | Trigger | LLM call(s) | Where it runs |
+|---|----------|---------|-------------|----------------|
+| 1 | Agentic orchestration (Schedule flow + Agent tab) | User pastes thread / uploads screenshots, clicks Find the events / Run the agents | **1× MiniCPM planning loop** (`MiniCPM4.1-8B`, or `MiniCPM5-1B` ≤4B variant; ≤6 steps) driving the Space's own MCP tools, **+ 1× gemma-cal E4B** per `extract_events` tool call (vision via mmproj); `check_conflicts`/`make_ics` are zero-LLM. Planner unconfigured or down → ScriptedPlanner runs the identical sequence, **gemma-cal only** | Two local llama-servers — gemma-cal on :8080, MiniCPM on :8081 |
+| 2 | API extraction (`POST /agent`) | iOS Shortcut, Android Tasker, or Mac collector in `AGENT_MODE=1` | **1× gemma-cal E4B** (same pipeline, same prompt) | Same |
+| 3 | Autonomous ingest | Mac collector → `/ingest`; your outgoing message triggers a run over the chat's rolling thread | **1× gemma-cal E4B per affected chat**, then deterministic dedup + calendar delivery | Same |
+| 4 | Memory-writing agent (optional) | `HERMES_TOOLS=1` on the remote path | **Hermes-3-Llama-3.1-8B** in a tool loop (≤3 rounds): may call `remember()` then returns the ActionPlan | Remote llama-server (e.g. Mac launchd) |
+| 5 | MCP tools for external agents | MCP client calls the Space | `extract_events` → **1× gemma-cal E4B**; `make_ics` and `check_conflicts` → **zero LLM calls** | Same as #1 |
+| 6 | CI / free-tier demo | `USE_STUB_EXTRACTOR=1` | **No LLM** — regex heuristic | CPU anywhere |
+| 7 | Training & eval (offline) | `training/gated_retrain.py` | **No LLM at the inference-API level**: data gen is template-based, eval is metric-based (no judge). The LLM here is the *training target*: QLoRA on `google/gemma-4-31B-it` / `gemma-4-E4B-it` | Modal A100/H100 |

docs/automations.md ADDED Viewed

	@@ -0,0 +1,83 @@

+# Automations — make it autonomous without a custom app
+Everything below drives one endpoint. **iOS cannot read iMessage in the background** (no API), so the
+autonomy ceiling differs by platform:
+| Front-end | Autonomy | Source | Notes |
+|---|---|---|---|
+| Mac collector (`AGENT_MODE`/`AUTONOMOUS`) | Fully hands-off | iMessage | Needs an always-on Mac |
+| iOS Shortcut | One gesture (you trigger it) | anything you share | No background reading possible |
+| Android Tasker/MacroDroid | Hands-off | SMS/RCS/notifications | Not iMessage |
+## The `/agent` contract (what they all call)
+`POST {SPACE_URL}/agent` with `Authorization: Bearer <INGEST_TOKEN>`:
+```jsonc
+// request — `thread` OR `messages` required; rest optional
+{
+  "thread": "Room parent: picture day Thursday 9am\nMe: thanks",
+  "messages": [{"sender": "Room parent", "text": "picture day Thursday 9am"}],
+  "images": ["data:image/png;base64,..."],   // a screenshot
+  "existing_ics": "<base64 .ics>",            // optional, enables conflict checks
+  "now": "2026-06-05T10:00:00",
+  "push_gcal": false,
+  "return_ics": true
+}
+```
+```jsonc
+// response
+{
+  "plan": { "events": [{"title":"Picture day","start":"2026-06-11T09:00:00", ...}],
+            "conflicts": [], "proposed_times": [], "reply_draft": "...", "needs_clarification": null },
+  "ics_base64": "<...>",
+  "gcal_links": []
+}
+```
+## (A) Mac collector — fully autonomous (iMessage)
+Two equivalent ways; prefer the server-side switch so logic lives in one place:
+- **Server-side:** run the Space with `AUTONOMOUS=1`. `/ingest` then assembles a per-chat rolling
+  thread, runs the agent, dedupes, and (if Google is configured) pushes events automatically.
+- **Collector-side:** run the collector with `AGENT_MODE=1` — it POSTs `/agent` (with `push_gcal`)
+  instead of `/ingest`. See `collector/collector.py`.
+```bash
+# collector-side
+cd collector && AGENT_MODE=1 python collector.py
+```
+## (B) iOS Shortcut — one tap, no `.ics` import
+1. New Shortcut → accept **Share Sheet** input (Text and Images).
+2. **Text** → set variable `Thread`.
+3. **Get Contents of URL** → `https://<your-space>/agent`, Method **POST**, Header
+   `Authorization: Bearer <INGEST_TOKEN>`, Request Body **JSON**:
+   `{ "thread": Thread, "now": <Current Date, ISO 8601> }`
+   (To send a screenshot instead: Base64-encode the shared image into `images`.)
+4. **Get Dictionary Value** `plan.events` from the response.
+5. **Repeat with Each** → **Add New Event** (Calendar): Title = `title`, Start = `start`,
+   End = `end`, Location = `location`, Notes = `notes`.
+Now sharing a thread/screenshot to the Shortcut adds the events to Apple Calendar in one tap — no
+file download, no import. (Optional: read back `plan.conflicts` and show an alert.)
+## (C) Android — Tasker / MacroDroid (SMS/RCS)
+1. **Trigger:** Event → *Received Text* (SMS), or a Notification trigger for your messaging app.
+2. **Action:** HTTP Request → POST `https://<your-space>/agent`, header
+   `Authorization: Bearer <INGEST_TOKEN>`, body `{ "thread": "%astext", "now": "%DATE..." }`.
+3. Parse `plan.events` (JSON Read) → for each, **Insert Calendar Event** (Tasker writes via
+   `CalendarContract`).
+Because Android can read SMS/RCS and run in the background, this path is genuinely autonomous.
+## Roadmap — a native app
+- **Android:** a real app using a Notification Listener / `READ_SMS`, on-device **Gemma E4B** via
+  llama.cpp/MLC (see [on-device.md](./on-device.md)), writing events through the Calendar provider —
+  the same `/agent` contract or fully local. Feasible and fully autonomous.
+- **iOS:** no background message or LLM-server access — the Shortcut above is the ceiling. An
+  autonomous iOS iMessage app is **not possible**; we won't promise one.

docs/blog-eval-gated-finetuning.md ADDED Viewed

	@@ -0,0 +1,187 @@

+# What Six Failed Fine-Tunes Taught Us About Evals, Templates, and Knowing When to Stop
+*A post-mortem on fine-tuning Gemma-4 for structured calendar extraction — fifteen GPU runs,
+one destroyed model, one exonerated quantizer, a chat-template landmine, and the eval harness
+that caught every bad model before it shipped.*
+---
+## The setup
+[OffGridSchedula](https://huggingface.co/spaces/ParetoOptimal/OffGridSchedula) is a local-first
+scheduling agent: paste a group chat (or a flyer screenshot) and get back a constrained
+**ActionPlan** JSON — events with exact ISO datetimes, a conflict check, a drafted reply, and a
+`needs_clarification` question when the thread is too vague to schedule. Inference is
+llama.cpp serving Gemma-4 GGUFs; no cloud AI APIs.
+The project carried a hard requirement: ship a **fine-tuned model** that outperforms its base.
+This is the story of trying to satisfy that requirement honestly — and what "honestly" ended up
+costing and teaching. Everything below ran on Modal serverless A100s; total GPU spend for the
+entire investigation was well under $100.
+## Act I: The fine-tune that lost to its own base
+The first QLoRA fine-tune of `google/gemma-4-31B-it` (Unsloth, r=16, 69 synthetic examples,
+2 epochs) looked fine in a smoke test. So we built a real eval before trusting it: 28 held-out
+examples scored on **start-exact recall** (did you produce the exact ISO start datetime),
+event F1 with greedy datetime matching, schema validity, no-event accuracy (does chitchat
+hallucinate events), and clarification recall (do you *ask* instead of inventing when a plan is
+"TBD"). Temperature 0, the same `response_format: json_schema` call the production server uses.
+First scores: fine-tune **F1 0.81**, base **0.977**. The fine-tune *lost to its own base*. The
+mismatch dump showed why — three of its five misses were the same corruption: `"206-10-06"`
+instead of `"2026-10-06"`. A dropped year digit.
+Two suspects: quantization (classic low-bit digit corruption) or the training itself.
+## Act II: Scaling data made it worse. Much worse.
+The intuitive fix — more data — backfired in the most instructive way possible:
+| training examples | schema validity | event F1 |
+|---|---|---|
+| 69 | 1.00 | 0.81 |
+| 87 | 0.75 | 0.465 |
+| 122 | 0.46 | 0.214 |
+| 2,122 (incl. real SMCalFlow data) | 0.107 | 0.000 |
+**Monotonic decay with training steps.** By the 2,122-example run the model emitted unparseable
+output on ~90% of inputs. A raw-output probe (serve the staging GGUF, generate *without* the
+JSON grammar) settled what "broken" meant: the model free-generated `Huddle — — — — — —…` to the
+token limit. Not a formatting problem. Destroyed weights.
+Two cheap experiments isolated the cause:
+**Quantization was exonerated** by sweeping the *same* merged weights through f16 / Q8_0 /
+Q4_K_M (one A100 lease, the fp16 was already on a Modal volume). At full fp16 the fine-tune
+still scored validity 0.64 / F1 0.57 — nowhere near base. Precision bought ~+0.1 F1. The damage
+preceded the quantizer.
+**The chat template was half the story.** Gemma-4 ships a brand-new template —
+`<|turn>user\n…<turn|>`, with a dedicated `<|turn>system` block. There is no
+`<start_of_turn>` anywhere in it. Our training code used Unsloth's legacy `"gemma"` template,
+which is built entirely on `<start_of_turn>`. Every gradient step optimized a turn syntax that
+`llama-server --jinja` (which reads the template *embedded in the GGUF*) never renders. We
+verified the fix end-to-end by reading `tokenizer.chat_template` out of our exported GGUF's
+metadata with `gguf.GGUFReader` — trust the artifact, not the code — and added a hard
+`assert "<|turn>" in rendered` to the training script so the mismatch can never silently
+recur.
+And yet: with templates verifiably aligned, response-only loss masking, and LR dropped to 5e-5,
+the 31B *still* collapsed to validity 0.0. With dataset, template, LR, and masking all varied,
+the one remaining common factor was the training stack itself: Unsloth's QLoRA path for the
+brand-new Gemma-4-31B architecture (its own logs warn it can't handle `Gemma4AudioModel`
+internals). The same recipe on Gemma-4 **E4B** trained cleanly every single time. New
+architectures make the training framework a first-class suspect.
+## Act III: The benchmark nobody wants — prompt engineering hits 1.0
+While the fine-tune investigation ran, error analysis kept improving the *system*:
+- **State the weekday in the prompt.** `Current datetime: Monday, 2026-09-14T09:00:00` turns
+  day-of-week resolution from memorized calendar knowledge into deterministic arithmetic — for
+  every model.
+- **Two surgical system-prompt lines** targeting the base's only two eval misses (multi-event
+  splitting; asking on "TBD") took **stock Gemma-4-31B to 1.0 on every metric**.
+That's the uncomfortable benchmark for any SFT project: against a near-ceiling base, prompt
+engineering had ~100× better ROI than fine-tuning. The requirement, however, was a fine-tune
+that beats *its* base — so we re-aimed at the tier where headroom actually existed.
+## Act IV: The E4B campaign — six gated runs to a tie
+A ~5 GB Gemma-4 E4B that runs on modest hardware is the model this local-first project actually
+wants at the edge, and stock E4B had real room: F1 0.93. Every retrain ran through an
+**eval-gate**: train → upload to a *staging* filename → eval → promote to production **only if
+it beats the bar**, else delete staging. The gate rejected eight models across this project
+without production ever serving one of them.
+Each iteration fixed a diagnosed failure, not a hunch:
+| run | change | F1 (eval) |
+|---|---|---|
+| 1 | fixed recipe, 2,122 examples | 0.884 (n=28) |
+| 2 | weekday-in-prompt, data regenerated to match | 0.955 |
+| 3 | dropped 74 SMCalFlow rows teaching a conflicting "next DOW" convention; 4× hand-data upsample | **1.000** |
+| 4 | + TBD-clarify seeds, 8× upsample | 0.93 (clarify → 1.0) |
+| 5 | clarify seeds at 4× | 0.93 |
+| — | **eval expanded 28 → 60 examples** | — |
+| 6 | + targeted seeds for the two shapes stock fails | 0.97 |
+Three findings here deserve their own bullets:
+- **Label conventions are silent killers.** SMCalFlow annotates "next Tuesday" (said on a
+  Monday) as *tomorrow*; our app's convention is *Tuesday of next week*. 74 imported rows
+  trained the bug in. Filtering them fixed it — until other data shifts brought it back.
+  When you convert someone else's dataset, you inherit someone else's semantics.
+- **Small evals lie.** At n=28 (22 gold events), one event = 4.5 recall points, and we watched
+  **4 added training rows flip 3 eval cases**. Run-to-run SFT jitter swamped the signal — runs
+  3–5 were a seesaw, not progress. Expanding to 60 examples / 50 events made the gate mean
+  something again.
+- **Some priors resist data.** The "next Tuesday = tomorrow" prior survived *seven* explicit
+  counter-examples. Stock makes the same error. Genuinely ambiguous English stays ambiguous.
+Run 6 vs stock E4B (with the same engineered prompt): **identical confusion counts** —
+48/50 events, tp/fp/fn 48/1/2, F1 0.97 = 0.97. A dead statistical tie.
+## Act V: The bare-prompt tiebreaker
+The classic argument for SFT at parity is internalization: the fine-tune shouldn't *need* the
+prompt. So we measured it — same 60 examples, system prompt deleted for both models, identical
+minimal user content, same JSON-schema constraint:
+| bare, n=60 | stock E4B | fine-tuned E4B |
+|---|---|---|
+| schema validity | 0.967 | **1.0** |
+| no-event accuracy | 0.70 | **0.80** |
+| clarification recall | 0.50 | **0.625** |
+| event F1 | **0.682** | 0.644 |
+The fine-tune is more *disciplined* bare (never breaks schema, hallucinates less, asks more);
+stock edges bare extraction. No decisive gap. **Final verdict: at this data scale (139
+hand-authored + 2,000 converted examples, QLoRA, 1 epoch), the fine-tune reaches parity with
+its base — not superiority.** It shipped as the project's edge model with exactly that claim on
+the model card, by explicit owner decision; the strict-dominance auto-gate, correctly, never
+promoted it.
+## What we'd tell you to do differently
+1. **Build the eval before the fine-tune, and gate every publish on it.** Ours rejected eight
+   bad models, caught a regression that had already overwritten a good artifact (server-side
+   `CommitOperationCopy` restored it for free), and converted every failure into a diagnosis.
+   The eval harness was the single highest-value artifact of the project.
+2. **Train format must equal serve format — and verify it in the artifact.** Read the chat
+   template out of the exported GGUF's metadata. Assert it in the training script. A template
+   mismatch doesn't error; it just quietly ruins everything at a rate proportional to your
+   training steps.
+3. **Suspect the training stack on new architectures.** The same recipe destroyed Gemma-4-31B
+   and trained Gemma-4-E4B flawlessly, six times in a row. Framework warnings about unhandled
+   submodules (`Gemma4AudioModel`) are not noise.
+4. **Exonerate quantization cheaply before blaming it.** Sweep the same weights across
+   f16/Q8/Q4 in one GPU lease. Ours cost a few dollars and killed the most plausible-sounding
+   hypothesis of the whole project.
+5. **Put deterministic facts in the prompt instead of hoping the model memorized them.**
+   Weekday-in-the-prompt improved every model, including the ones we didn't train.
+6. **Match your eval's resolution to your iteration size.** If one flipped case moves a gated
+   metric by 4+ points, your gate is a coin flip.
+7. **Diff label conventions when importing datasets.** Resolution semantics ("next DOW"),
+   reference-time handling, and reply style all transfer — whether you want them to or not.
+8. **Respect the parity outcome.** Against a strong instruction-tuned base on a narrow,
+   well-prompted task, SFT parity is a common honest result. The defensible claims left are
+   discipline-under-no-prompt, token savings, and convention control — claim those, not wins
+   you didn't measure.
+### What might still beat the base
+10–100× more *real* (non-template) data; full fine-tuning rather than QLoRA once the stack
+supports the architecture; preference optimization (DPO) specifically on the
+clarify-vs-extract boundary; and a harder eval where the ceiling isn't 0.97. The gate is
+already in place to referee all of it.
+---
+*Artifacts: model repo
+[`ParetoOptimal/gemma-4-cal-gguf`](https://huggingface.co/ParetoOptimal/gemma-4-cal-gguf)
+(31B v1 + E4B edge + mmproj), eval harness `training/eval.py` + `training/data/eval.jsonl`,
+gate `training/gated_retrain.py`, importer `training/import_smcalflow.py` (SMCalFlow,
+CC BY-SA 4.0 — Semantic Machines et al., TACL 2020), full run-by-run log in
+`docs/eval-roadmap.md`.*

docs/build-small-submission.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# Build Small — submission mapping
+How OffGridSchedula lines up with every requirement, track, sponsor prize, and badge of the
+[Build Small hackathon](https://huggingface.co/build-small-hackathon)
+([field guide](https://huggingface.co/spaces/build-small-hackathon/field-guide)).
+Tags claimed in the README frontmatter use the field guide's namespaced taxonomy
+(`track:*`, `sponsor:*`, `achievement:*`).
+## Hard rules
+| # | Rule | Status | Evidence |
+|---|------|--------|----------|
+| 1 | Every model under 32B parameters | ✅ | Two local models, both far under the cap: extraction is [`gemma-cal` E4B](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf) (~4B effective params, ~5 GB GGUF at Q4) and planning is [`openbmb/MiniCPM5-1B`](https://huggingface.co/openbmb/MiniCPM5-1B-GGUF) (1B). |
+| 2 | Gradio app, hosted as an HF Space (Docker OK) | ✅ | [`app.py`](../app.py) is a Gradio Blocks app served from a Docker SDK Space running llama.cpp, now hosted in the hackathon org: [`build-small-hackathon/OffGridSchedula`](https://huggingface.co/spaces/build-small-hackathon/OffGridSchedula). |
+| 3 | Demo video | ✅ | Recorded and linked from the README: [youtu.be/m-o0u9X3tI4](https://youtu.be/m-o0u9X3tI4) (storyboard in [`docs/demo-script.md`](./demo-script.md)). |
+| 4 | Social media post, linked from the README | ✅ | Published and linked from the README: [X (1)](https://x.com/nate_mauer/status/2064920352845709419), [X (2)](https://x.com/nate_mauer/status/2065661878441750916), and [LinkedIn](https://www.linkedin.com/feed/update/urn:li:ugcPost:7471440639969132545) (drafts in [`docs/social-post.md`](./social-post.md)). |
+| 5 | ≤ 10 ZeroGPU apps per user | ✅ n/a | Runs on cpu-basic (stub preview) or a dedicated T4 — no ZeroGPU dependency. |
+| 6 | README frontmatter tags + short write-up of idea & tech | ✅ | Namespaced tags + the idea-and-tech write-up are in [`README.md`](../README.md). |
+## Track — `track:backyard` (Backyard AI)
+A specific real person: a busy parent whose kid's school and activity events are buried in a
+noisy class group chat. They paste the chat (or a flyer screenshot) from their phone's browser
+and get back events, a conflict check against their own calendar, and a ready-to-send reply —
+reviewed before anything is saved, exported as a local `.ics` (Apple/Google Calendar one tap
+away). Short pasted chats and screenshots are exactly the workload a small local model handles
+well — an honest fit, not a stretch.
+## Sponsor prize — `sponsor:modal` (Best Use of Modal)
+Modal powered the **development** of the platform's model end-to-end:
+- [`training/modal_train.py`](../training/modal_train.py) — full QLoRA fine-tune on serverless A100/H100s (dataset → train → GGUF export → HF publish), with persistent Volumes caching base weights and outputs across runs.
+- [`training/modal_eval.py`](../training/modal_eval.py) / [`modal_quant_eval.py`](../training/modal_quant_eval.py) — the 60-example task eval served on llama.cpp inside Modal, including an on-volume quantization study (f16 / Q8_0 / Q4_K_M).
+- [`training/gated_retrain.py`](../training/gated_retrain.py) — the eval-gated pipeline: train → staging upload → eval → promote **only if it beats the gate**. It rejected eight regressed models before the published one; every one of those runs was a Modal job.
+## Sponsor prize — `sponsor:openbmb` (Best MiniCPM Build)
+Clicking **Run the agents** invokes **OpenBMB MiniCPM** as the planner (`openbmb/MiniCPM5-1B-GGUF`;
+the larger `MiniCPM4.1-8B` variant is a config switch) on a second local llama.cpp instance. It
+drives this Space's own MCP tools (`extract_events` → `check_conflicts` → `make_ics`) as a visible
+multi-step agent ([`server/orchestrator.py`](../server/orchestrator.py)) — MiniCPM is core to the
+agent experience, not a garnish (a deterministic scripted plan is the fallback when the planner
+isn't configured). Also the natural evidence for the judged **Best Agent** award.
+## Achievement badges (self-declared, all claimed)
+| Tag | Badge | Evidence |
+|-----|-------|----------|
+| `achievement:offgrid` | Off the Grid | All inference runs inside the Space via llama.cpp — no cloud AI APIs. The only optional outbound call is the user's own Google Calendar push. |
+| `achievement:welltuned` | Well-Tuned | [`build-small-hackathon/gemma-4-cal-gguf`](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf) — our published QLoRA fine-tune of Gemma-4 E4B **is the model production serves**, shipped through the eval gate with the [honest scorecard public](./eval-roadmap.md). |
+| `achievement:offbrand` | Off-Brand | Custom landing page, grouped nav, dark hero + carousel, elevated tool card, bespoke CSS/JS ([`ui/blocks.py`](../ui/blocks.py), [`static/app.css`](../static/app.css)) — far past the stock Gradio look. |
+| `achievement:llama` | Llama Champion | The official `ghcr.io/ggml-org/llama.cpp` server image runs the GGUF + vision mmproj ([`Dockerfile`](../Dockerfile), [`scripts/start_space.sh`](../scripts/start_space.sh)). |
+| `achievement:sharing` | Sharing is Caring | Redacted agent traces published to the public dataset [`ParetoOptimal/offgridschedula-traces`](https://huggingface.co/datasets/ParetoOptimal/offgridschedula-traces) — one-click from the Activity tab, or [`training/share_trace.py`](../training/share_trace.py). |
+| `achievement:fieldnotes` | Field Notes | [`FIELD_NOTES.md`](../FIELD_NOTES.md) (build retrospective) + [`docs/blog-eval-gated-finetuning.md`](./blog-eval-gated-finetuning.md) (fine-tuning post-mortem) + the [published project blog](https://huggingface.co/blog/build-small-hackathon/offgridschedula) ([source](./blog-offgridschedula.md)). |
+Sponsor prizes **not** claimed: OpenAI Codex (no Codex-attributed commits) and NVIDIA Nemotron
+(different model family). The cash bonus badges (Off Brand, Tiny Titan, Best Demo, Best Agent,
+Bonus Quest Champion, Judges' Wildcard) are judged across all submissions and take no tags.
+## Status
+All six hard rules are met — nothing outstanding:
+- The Space is live in the hackathon org: [`build-small-hackathon/OffGridSchedula`](https://huggingface.co/spaces/build-small-hackathon/OffGridSchedula).
+- The model is published at [`build-small-hackathon/gemma-4-cal-gguf`](https://huggingface.co/build-small-hackathon/gemma-4-cal-gguf) (the planner is [`openbmb/MiniCPM5-1B-GGUF`](https://huggingface.co/openbmb/MiniCPM5-1B-GGUF)).
+- The [demo video](https://youtu.be/m-o0u9X3tI4) and the social posts (X + LinkedIn) are published and linked from the README.
+- The write-up is live as a Hugging Face blog post: [build-small-hackathon/offgridschedula](https://huggingface.co/blog/build-small-hackathon/offgridschedula).

docs/eval-roadmap.md ADDED Viewed

	@@ -0,0 +1,337 @@

+# Eval roadmap — improving the scheduling fine-tune
+How we measure and improve `ParetoOptimal/gemma-4-cal-gguf` (the fine-tuned
+Gemma-4-31B that turns chat/images into a calendar `ActionPlan`). The eval is
+**task-specific** — generic LLM benchmarks (MMLU etc.) don't apply.
+Harness: `training/eval.py` (scores), `training/gen_eval.py` + `training/data/eval.jsonl`
+(28 held-out examples, disjoint from `dataset.jsonl`), `training/modal_eval.py`
+(serves the GGUF on the same `llama-server` the Space uses, then scores).
+## Baseline scores (Q4_K_M, n=28, 2026-06-09)
+| Metric | Score |
+| --- | --- |
+| schema validity | 1.00 |
+| no-event accuracy | 1.00 |
+| clarification recall | 1.00 |
+| end-time exact | 1.00 |
+| event precision | 0.85 |
+| **event recall (start-exact)** | **0.77** |
+| event F1 | 0.81 |
+| title similarity | 0.87 |
+Discipline (never invents events, always asks when ambiguous) is perfect; all 9
+relative-date cases passed. The gap is **exact start datetime** on a few
+explicit far-future dates (misses: `e02`, `e05`, `e06`, `e15`, one leg of `m02`).
+## The 3 steps
+### 1. Diagnose the 5 misses (cheap)
+Enhance `eval.py` to dump the model's actual `start`/`title` for mismatched events,
+then one re-run shows whether they're date-shift, time/AM-PM, or wrong-year errors —
+which tells us exactly what training data to add. (~one A100 eval run; the GGUF is
+cached in the Modal Volume, so it's fast.)
+### 2. Baseline comparison (the "Well-Tuned" proof)
+Run `modal run training/modal_eval.py --model-hf-repo unsloth/gemma-4-31B-it-GGUF`
+to score **stock** Gemma-4-31B on the same set. If the fine-tune's discipline
+(no-event 1.0, clarification 1.0) and datetime recall beat stock, that's concrete
+evidence the fine-tune helps. (Separate ~18 GB model download + A100 time.)
+### 3. Close the gap
+Add ~15–20 explicit-date examples (especially next-month dates and times) to
+`training/data/dataset.jsonl`, re-train on Modal (`training/modal_train.py`),
+re-eval — and watch start-exact recall move.
+## Results log
+### Step 1 — diagnosis (2026-06-09)
+The mismatch dump showed the misses are **not** a reasoning failure. 3 of 5 are the
+same bug — a dropped year digit, **"206" instead of "2026"** — on next-month dates
+(month/day/time all correct):
+```
+[e02] gold 2026-10-06T15:30  pred 206-10-06T15:30
+[e05] gold 2026-10-01T08:15  pred 206-10-01T08:15
+[e15] gold 2026-10-08T19:00  pred 206-10-08T19:00
+[e06] gold 2026-09-28T09:00  pred []                 (abstained)
+[m02] Standup + Sprint demo  pred Standup only       (dropped 2nd leg)
+```
+Fix indicated: more far-future explicit-date examples reinforcing 4-digit years
+(+ multi-event 2nd legs). → Step 3.
+### Step 2 — baseline vs fine-tune (2026-06-09, n=28, Q4_K_M)
+| Metric | Stock `gemma-4-31B-it-GGUF` | Fine-tune `gemma-4-cal-gguf` |
+| --- | --- | --- |
+| schema validity | 1.00 | 1.00 |
+| event precision | **1.00** | 0.85 |
+| start-exact recall | **0.955** | 0.773 |
+| event F1 | **0.977** | 0.81 |
+| end-exact | 1.00 | 1.00 |
+| no-event accuracy | 1.00 | 1.00 |
+| clarification recall | 0.75 | **1.00** |
+**Honest read:** stock Gemma-4-31B is already strong at this extraction and *beats*
+the current fine-tune on datetime recall — the "206" bug is a fine-tune regression.
+The fine-tune's only clear win is **clarification discipline** (asks when a thread is
+"date TBD"; stock missed `q04`). As-is, the fine-tune is **not** justified on
+extraction. Step 3 must fix the year regression and clear baseline's 0.955 recall
+while keeping clarification at 1.00 — otherwise the better play is stock + the
+fine-tune's clarification behavior via prompting.
+### Step 3 — after gap-closing retrain (2026-06-09) — REGRESSED
+Dataset grown 69 → 87 (+18 Oct–Dec 2026 explicit-date examples, disjoint from eval),
+same 2-epoch recipe, re-quantized to Q4_K_M and republished. Re-eval (n=28):
+| Metric | Stock 31B | Fine-tune v1 (69) | **Fine-tune v2 (87, retrained)** |
+| --- | --- | --- | --- |
+| schema validity | 1.00 | 1.00 | **0.75** |
+| event precision | 1.00 | 0.85 | **0.476** |
+| start-exact recall | 0.955 | 0.773 | **0.455** |
+| event F1 | 0.977 | 0.81 | **0.465** |
+| end-exact | 1.00 | 1.00 | 1.00 |
+| no-event accuracy | 1.00 | 1.00 | 1.00 |
+| clarification recall | 0.75 | 1.00 | **0.75** |
+**The naive retrain made it worse, not better.** New failure modes: unparseable/empty
+JSON (validity 1.0→0.75), duplicate events, hallucinated "Drive to …" events,
+transposed/garbage years (`2062`, `2062-15:00:00`), and previously-passing relative
+dates now empty. Cause: overfitting — 18 of 87 examples were near-identical far-future
+templates, biasing a tiny dataset and degrading general formatting/extraction.
+## Conclusions & recommendation
+1. **Stock Gemma-4-31B is already strong** at this extraction (F1 0.98). The only
+   thing fine-tuning reliably *added* was clarification discipline (v1: 1.00 vs stock
+   0.75) — and even that was lost in v2.
+2. **Tiny-dataset SFT is fragile here.** v1 (69 ex) underperformed stock on dates;
+   v2 (87 ex) regressed hard. More data of the *same shape* hurt.
+3. **Recommended path** (pick one):
+   - **Ship stock + prompt for clarification** — simplest; recover the one real win
+     without the regressions. (Lowest risk.)
+   - **If keeping a fine-tune:** rebuild the dataset much larger and *diverse* (not
+     template-heavy), drop to ~1 epoch with regularization, and **gate every retrain
+     on this eval** (only publish if it beats the current best). Consider a higher
+     quant (Q5/Q6) to rule out the `"206"`/`2062` digit corruption being quant-driven.
+4. **Action — revert the live model.** v2 (worse) overwrote v1 in
+   `ParetoOptimal/gemma-4-cal-gguf`. Restore v1 (the better fine-tune) or point the
+   Space back at stock `unsloth/gemma-4-31B-it-GGUF` until a fine-tune *beats* the
+   eval baseline.
+**Bottom line: the eval did its job — it caught a regression before it reached users,
+and it says the current fine-tune is not yet worth shipping over stock.**
+## Follow-up (2026-06-09)
+### Live model restored to v1
+v2 (regressed) was rolled back: `gemma-cal-Q4_K_M.gguf` in the repo was restored to the
+v1 LFS object via a server-side `CommitOperationCopy` (no transfer, no GPU). Production
+serves the better v1 again.
+### Dataset rebuilt larger + more diverse (69 → 122)
+Added a diversity batch (`gen_new_seeds.MORE_SEEDS3`): varied date/time formats
+(`10/15`, "the 3rd", "half past 7", "0900", "noon", "midnight"), reschedules,
+cancellations, recurring, all-day, deadlines (EOD/midnight), past & hypothetical
+(must NOT schedule), richer no-event & clarify, and varied image sources (ticket,
+invite screenshot, notice). Goal: counter the template-heavy skew that overfit v2.
+Verified valid + disjoint from `eval.jsonl`.
+### Eval-gating is now the publishing process
+**No retrain publishes unless it beats the eval.** `training/gated_retrain.py`:
+1. retrain on Modal → upload to a **staging** filename (`gemma-cal-staging-Q4_K_M.gguf`)
+   in the repo (production file untouched; mmproj skipped — `--skip-mmproj`);
+2. eval the staging file (`modal_eval.py --model-file …`);
+3. gate: `schema_validity ≥ 0.95`, `event_f1 ≥ 0.81`, `start-exact recall ≥ 0.773`
+   (defaults = the current best, v1) — tune via `--gate-f1/--gate-recall`;
+4. **PASS** → promote staging → production via server-side `CommitOperationCopy` (free);
+   **FAIL** → delete staging, production unchanged.
+Run: `python training/gated_retrain.py [--epochs 1 --gate-f1 … --gate-recall …]`.
+### Step 4 — first eval-gated retrain (122 ex, 1 epoch) — GATE FAILED ✅ (protected prod)
+The retrain scored **worse** than every prior version and the gate refused to publish:
+| Metric | Stock | v1 (live) | v3 staging (122, 1ep) |
+| --- | --- | --- | --- |
+| schema validity | 1.00 | 1.00 | **0.46** |
+| event F1 | 0.977 | 0.81 | **0.214** |
+| start-exact recall | 0.955 | 0.773 | **0.136** |
+| no-event accuracy | 1.00 | 1.00 | 1.00 |
+| clarification recall | 0.75 | 1.00 | 1.00 |
+>½ of outputs were unparseable; extraction collapsed. **Gate: FAIL → staging deleted,
+production unchanged (still v1).** The gate worked exactly as intended.
+## Verdict (after 3 fine-tune attempts)
+All three fine-tunes — v1 (69 ex / 2 ep), v2 (87 / 2 ep), v3 (122 / 1 ep) — **underperform
+stock Gemma-4-31B**, and the larger runs broke JSON validity. Only the safety behaviors
+(no-event, clarification) survive fine-tuning; extraction degrades. **QLoRA-on-31B-Q4 here
+is fragile and not worth shipping over stock.** Recommended: serve **stock
+`unsloth/gemma-4-31B-it-GGUF`** and recover the one fine-tune win (clarification) via the
+prompt. Keep v1 as the published fine-tune for the "Well-Tuned" artifact, but don't route
+production extraction through it. Revisit fine-tuning only with a substantially larger, more
+varied dataset and a recipe that holds schema validity at 1.0 — gated, as now, on this eval.
+## Step 5 — quantization-penalty test (2026-06-09): quant EXONERATED
+Hypothesis: maybe Q4 quantization (the `"206"`/`2062` digit bug) was tanking the fine-tune.
+Tested the SAME fine-tuned weights (`gemma-cal-f16.gguf`, v2/87-ex — best fp16 still on the
+volume) at three precisions on the 28-example eval (`training/modal_quant_eval.py`):
+| precision | schema validity | event F1 | start-exact recall |
+| --- | --- | --- | --- |
+| f16 (full) | 0.643 | 0.571 | 0.545 |
+| Q8_0 | 0.679 | 0.565 | 0.591 |
+| Q4_K_M | 0.75 | 0.465 | 0.455 |
+| base (stock) | 1.00 | 0.977 | 0.955 |
+**Quantization is not the cause.** At full fp16 the fine-tune still scores validity 0.64 / F1
+0.57 — nowhere near base; validity is actually *lower* at f16 than Q4, so quant isn't breaking
+the JSON. Precision buys only ~+0.1 F1/recall (Q4→Q8/f16), a fraction of the gap to base. The
+degradation is the **SFT itself**, not the GGUF conversion. Step 2 (retrain at Q8 to beat base)
+is **not pursued** — the gate would fail. (Caveat: v1's fp16 was overwritten, so this used v2;
+a definitive v1 test needs a retrain, but the small quant lift makes a base-beating result
+improbable.)
+### Final recommendation
+A higher quant won't make the fine-tune beat base, and an automation agent (e.g. `ml-intern`)
+doesn't change the binding constraints (near-ceiling base; small data; SFT degrades
+instruction-following). **Serve stock `unsloth/gemma-4-31B-it-GGUF`** and recover the
+clarification behavior via the system prompt; keep v1 as the "Well-Tuned" artifact. Only
+revisit fine-tuning with a substantially larger, real, diverse dataset + a validity-preserving
+recipe (low LR, few steps), always gated on this eval.
+## Real training data: SMCalFlow importer
+`training/import_smcalflow.py` converts **SMCalFlow** (Microsoft Semantic Machines, **CC BY-SA
+4.0**) calendar dialogues into our `ActionPlan` format. SMCalFlow encodes events as LISP
+"dataflow" programs; the importer parses `CreatePreflightEventWrapper` turns, extracts
+subject/start/location/attendees, and **resolves** date/time constructs (`Tomorrow`, `NextDOW`,
+`MD`, `NumberPM`, `HourMinuteMilitary`, …) against a per-example reference `now` spread across
+2026 — so relative dates become concrete, self-consistent targets (directly trains the failing
+date/time skill, with varied 4-digit years). Conservative: only emits a row when a title AND an
+explicit start time resolve (~7.5k usable turns from train+valid).
+- Run: `python training/import_smcalflow.py --limit 2000 --heldout 200` → writes
+  `training/data/smcalflow_train.jsonl` (+ `…_heldout.jsonl`). **Both are git-ignored** (CC BY-SA
+  share-alike vs this repo's Apache-2.0 → we don't commit/redistribute the derived data; the
+  importer code is ours) and **disjoint from `eval.jsonl`**.
+- `train_qlora.py` now trains on `dataset.jsonl` **+** `smcalflow_train.jsonl` (when present).
+  `gated_retrain.py` therefore trains on real data, and still **only publishes if it beats the
+  gate** — so a bigger-but-worse model can't reach production.
+- Attribution (required by CC BY-SA): *Semantic Machines et al., "Task-Oriented Dialogue as
+  Dataflow Synthesis," TACL 2020.*
+## Step 6 — eval-gated retrain on REAL data (2026-06-09): FAILED gate (worst yet)
+Trained the 31B on **2,122 examples** (122 hand-authored + 2,000 real SMCalFlow), 1 epoch,
+through `gated_retrain.py` with a beat-base gate (F1≥0.95, recall≥0.90). Result on the 28-ex eval:
+| Metric | base | v1 (live) | real-data (2,122 ex) |
+| --- | --- | --- | --- |
+| schema validity | 1.00 | 1.00 | **0.107** |
+| event F1 | 0.977 | 0.81 | **0.000** |
+| start-exact recall | 0.955 | 0.773 | **0.000** |
+~90% unparseable output, zero events extracted. **Gate FAIL → not promoted; production stays v1.**
+### Verdict across 4 fine-tunes (now incl. real data)
+Scores **monotonically worsen with more training/data**: v1 (69 synth, F1 0.81) → v2 (87, 0.465)
+→ v3 (122, 0.214) → real (2,122, 0.0). This is no longer a *data* problem — **the SFT recipe
+itself degrades the model**, and more data makes it worse. Most likely root cause to investigate
+*if* fine-tuning is ever revisited: a **train/inference chat-template mismatch** — `train_qlora.py`
+formats with Unsloth's `get_chat_template("gemma")` while `llama-server` serves with the GGUF's
+own `--jinja` template; if these differ for Gemma-4, training optimizes a format the server never
+uses, and the divergence compounds with more steps (exactly the monotonic decay seen). Other
+suspects: LR too high (2e-4) / catastrophic forgetting on a near-ceiling base.
+**Final, evidence-backed recommendation: serve stock `unsloth/gemma-4-31B-it-GGUF`** (best by far)
+and recover clarification via the system prompt. Do NOT route production through any current
+fine-tune. The eval-gate has now correctly rejected 2 bad retrains — keep it as the publish gate.
+## Step 7 — recipe fix + raw-output probe (2026-06-09): training stack implicated, fine-tuning HALTED
+Fixed the suspected train/serve chat-template mismatch (PR #54): Gemma-4's native
+`chat_template.jinja` uses a NEW `<|turn>role … <turn|>` format (no `<start_of_turn>` at all),
+while training forced unsloth's legacy "gemma" template. `train_qlora.py` now formats with the
+tokenizer's native template (hard `<|turn>` assert), masks loss to the assistant turn, LR 5e-5.
+Retrained on the 2,122-example set through the gate: **validity 0.0 — gate FAIL** (production
+stays v1, third bad retrain rejected).
+Diagnostics that pinpointed the cause:
+- **GGUF template check (CPU, ~free):** our exported staging GGUF embeds the correct native
+  `<|turn>` template (16,934 chars, no `<start_of_turn>`) → train and serve formats are now
+  verifiably aligned. Template is exonerated as the remaining cause.
+- **Raw-output probe (`/outputs/gemma-cal-staging-Q4_K_M.gguf`):** free generation emits pure
+  degenerate looping — `'Huddle — — — — — …'` to the token limit; constrained generation emits
+  512 tokens of nothing. **The weights are destroyed, not misformatted.**
+With dataset (69→2,122), template (legacy/native), LR (2e-4/5e-5), and masking (on/off) all
+varied, degradation always tracks training steps and ends in token-loop collapse. The remaining
+common factor is **Unsloth's QLoRA path for Gemma-4-31B** (new architecture; training logs warn
+`get_input_embeddings not auto-handled for Gemma4AudioModel`). **Fine-tuning is halted** until
+that stack demonstrably works for this arch (or is replaced with plain transformers+PEFT).
+## Step 8 — improve served evals via prompt (stock + targeted SYSTEM additions)
+Base's only eval misses are prompt-fixable: m03 dropped the 2nd event of a multi-event thread;
+q04 didn't ask clarification on a "TBD" plan. Added two surgical SYSTEM lines (list every
+distinct event separately; ask via needs_clarification when day/time is TBD).
+**Result: PERFECT SCORE — 1.0 on every metric (n=28, tp/fp/fn = 22/0/0).**
+| Metric | base (old prompt) | **base + new prompt** |
+| --- | --- | --- |
+| schema validity | 1.00 | **1.00** |
+| event precision | 1.00 | **1.00** |
+| start-exact recall | 0.955 | **1.00** |
+| event F1 | 0.977 | **1.00** |
+| no-event accuracy | 1.00 | **1.00** |
+| clarification recall | 0.75 | **1.00** |
+Both misses fixed, nothing regressed. **This is the production configuration: stock
+`unsloth/gemma-4-31B-it-GGUF` + the updated SYSTEM prompt.** (Set Space var
+`MODEL_HF_REPO=unsloth/gemma-4-31B-it-GGUF`; the prompt ships with the app.) The "Well-Tuned"
+artifact remains `ParetoOptimal/gemma-4-cal-gguf` (v1); any future fine-tune must beat THIS
+1.0 baseline through the gate — i.e., match it and win on a harder, expanded eval set.
+## Step 9 — the E4B edge-model campaign (2026-06-10)
+Re-aimed fine-tuning where it has headroom: a **Gemma-4 E4B (~8B)** edge model that runs without a
+paid A100, gated against **stock E4B**. Six gated runs, each fixing a diagnosed failure (the fixed
+recipe trained cleanly every time — validity 1.0 throughout, confirming the Step-7 breakage was
+specific to the 31B path):
+| run | change | F1 | recall | clarify | eval |
+| --- | --- | --- | --- | --- | --- |
+| #1 | fixed recipe, 2,122 ex | 0.884 | 0.864 | 1.0 | n=28 |
+| #2 | + weekday-in-prompt (+data regen) | 0.955 | 0.955 | 0.75 | n=28 |
+| #3 | + next-DOW conflict filter (74 rows), 4× hand | **1.0** | **1.0** | 0.75 | n=28 |
+| #4 | + TBD-clarify seeds, 8× hand | 0.93 | 0.909 | 1.0 | n=28 |
+| #5 | clarify seeds, 4× hand | 0.93 | 0.909 | 1.0 | n=28 |
+| — | **eval expanded 28→60** (50 events; jitter-resistant) | | | | |
+| #6 | + Batch-7 seeds (next-DOW, "opens") | 0.97 | 0.96 | 1.0 | n=60 |
+| stock E4B (weekday prompt) | | 0.97 | 0.96 | 1.0 | n=60 |
+Run #6 vs stock is an **exact statistical tie** (identical tp/fp/fn 48/1/2; both miss `e09`
+"next Tuesday" — which resisted 7 explicit training seeds — and one "opens" case each).
+Campaign side effects that improved the PRODUCT for every model: weekday-in-prompt, the
+next-DOW convention cleanup, and the 60-example eval.
+## Step 10 — bare-prompt (internalization) test: no decisive gap
+Dropped the system prompt for both models (identical minimal user content, same JSON-schema
+constraint; `modal_eval.py --minimal-prompt`), measuring internalized task knowledge:
+| bare, n=60 | stock E4B | fine-tuned E4B |
+| --- | --- | --- |
+| schema validity | 0.967 | **1.0** |
+| event F1 | **0.682** | 0.644 |
+| start-exact recall | **0.60** | 0.56 |
+| no-event accuracy | 0.70 | **0.80** |
+| clarification recall | 0.50 | **0.625** |
+Small trade-offs both ways, within noise. **Verdict: at this data scale (139 hand + 2,000
+SMCalFlow) with QLoRA/1-epoch, the E4B fine-tune reaches PARITY with stock, not superiority** —
+non-degraded, perfect validity everywhere, better bare-prompt discipline, slightly weaker bare
+extraction. The strict-dominance gate therefore never auto-promoted it; the candidate GGUF
+remains on the Modal volume (`/outputs/gemma-cal-e4b-staging-Q4_K_M.gguf`). Publishing it as
+the project's edge model at parity is a **product decision** (zero quality cost; production
+would serve our own fine-tune, fulfilling "Well-Tuned") — deliberately left to the owner, not
+the gate.

docs/gcal-verify.md ADDED Viewed

	@@ -0,0 +1,73 @@

+# Verifying the Google Calendar connection end-to-end
+> **Private-Space gotcha (the OAuth popup 404):** while the Space is private,
+> `*.hf.space` URLs answer Hugging Face's 404 page for any request that lacks
+> the signed access cookie. The app viewed EMBEDDED on huggingface.co
+> authenticates its iframe with a short-lived signed URL, but the OAuth POPUP
+> is a separate top-level window — when the subdomain cookie is missing or
+> expired, **Connect opens a 404**. Fix: always open
+> `https://paretooptimal-offgridschedula.hf.space` directly in its own tab
+> (the redirect re-mints the cookie for the whole subdomain), then connect
+> from there. Making the Space public would remove this entirely, at the cost
+> of the deployed source tree becoming publicly browsable — deliberately NOT
+> done (2026-06-12).
+Two layers of verification exist:
+1. **In-app, automatic** — on every page load, `wireGcal()` round-trips the stored
+   token to `POST /oauth2/check`, which makes one real (scope-compatible) Google
+   API call. The Step 2a row and the export-bar badge upgrade from
+   "✓ connected" to **"✓ connected · verified"** when Google answers; a
+   *definitive* rejection (revoked/invalid token) clears the stored token and
+   flips everything to "not connected". Transient problems (OAuth env unset,
+   network down, `SERVE=gradio` mode where FastAPI routes aren't served) never
+   destroy the token — the UI just stays at the local shape-check state.
+2. **Scripted E2E** — `scripts/verify_gcal_e2e.py` proves the whole chain:
+   agent-extracted event → real push → API readback (title/location/start/
+   reminder) → cleanup.
+## Browser loop (manual)
+1. Run locally with OAuth configured:
+   ```
+   set GOOGLE_OAUTH_CLIENT_ID=...        # a Google Cloud OAuth "Web application" client
+   set GOOGLE_OAUTH_CLIENT_SECRET=...
+   set SERVE=uvicorn
+   python app.py
+   ```
+2. Open http://localhost:7860, switch to **☁️ Online**, open Step 2a
+   **"🔗 Connect your calendar"** → click **Connect** on the Google row →
+   consent in the popup. The row flips to "✓ connected", then upgrades to
+   **"✓ connected · verified"** within ~1s (the `/oauth2/check` round-trip).
+3. Paste the appointment text (the CANON sample in `tests/test_agent.py`) →
+   **Run the agents** → the export toolbar appears with the badge
+   **"Google: ✓ connected · verified"** next to the three buttons.
+4. Click **Add to Google Calendar** → the status line shows the created event
+   link; open it: *Mon Jun 22 2026, 10:15–11:00, 112A West 72nd Street,
+   New York, NY 10023*, 60-minute reminder.
+5. **Reload the page** → still verified, no re-prompt (the acceptance test for
+   "never asks again").
+### Negative paths
+- Revoke access at https://myaccount.google.com/permissions → reload → the
+  check is definitive: token is cleared, every surface shows "not connected".
+- Unset the OAuth env vars (or kill the network) → reload → stays at plain
+  "✓ connected" — transient failures never log the user out.
+- Click **disconnect** in Step 2a → flips everywhere instantly.
+## Scripted loop
+One-time: after connecting in the browser, copy the `gcal_token` value from
+DevTools → Application → Local Storage into a file (e.g. `tok.json` — it's
+gitignored territory; don't commit it).
+```
+python scripts/verify_gcal_e2e.py --token-file tok.json --check-only   # liveness only
+python scripts/verify_gcal_e2e.py --token-file tok.json               # full E2E
+```
+The full run pushes the CANON event with a nonce in the title (`[e2e-xxxxxx]`),
+reads it back through the API, asserts summary/location/start-instant/reminder,
+and deletes it (use `--keep` to inspect it in the calendar first). Exit code 0
+= all checks passed.

docs/hermes.md ADDED Viewed

	@@ -0,0 +1,48 @@

+# The Hermes "grows-with-you" brain
+The agent's reasoning is pluggable through `INFERENCE_BASE_URL` (see `server/model.py`). Point it at a
+**NousResearch Hermes** model served OpenAI-compatible and the whole pipeline uses it — **no code
+change**. Hermes is a tool-calling Llama/Qwen fine-tune, a good fit for the autonomous daemon.
+## Serve Hermes locally (llama.cpp → "Llama Champion")
+```bash
+# Hermes 3 Llama 3.1 8B (Q4_K_M) runs comfortably on a Mac with Metal.
+llama-server -m ~/models/Hermes-3-Llama-3.1-8B.Q4_K_M.gguf \
+  --host 127.0.0.1 --port 8080 --ctx-size 8192 --jinja   # --jinja = tool-calling template
+```
+Point the backend at it:
+```bash
+export INFERENCE_BASE_URL="http://127.0.0.1:8080/v1"
+export INFERENCE_MODEL="hermes"
+export USE_STUB_EXTRACTOR=0
+python app.py
+```
+`server/model.py` routes `complete_json` / `stream_complete_json` to the remote server when
+`INFERENCE_BASE_URL` is set (`_remote_*`), still grammar-constraining the output to the ActionPlan
+schema. (Ollama or vLLM also work — any OpenAI-compatible endpoint.)
+## "Grows with you" — the memory (`server/memory.py`)
+Durable facts/preferences personalize every extraction; they're injected into the prompt via
+`recall()` (`server/agent.py::build_messages`) and shown/edited in the dashboard **Memory** tab.
+- **Learns automatically:** recurring event attendees become `contact` facts (`observe_plan`).
+- **You can teach it:** add facts in the Memory tab — `"Dana is the soccer coach"`,
+  `"you decline Mondays"`, `"default location is Lincoln Elementary"`.
+- **Hermes can update it itself:** set `HERMES_TOOLS=1` and the remote path advertises a `remember`
+  tool (`server/tools.py`); the model calls it mid-run to save durable facts, then returns the
+  ActionPlan. The tool-call loop is in `server/model.py::_remote_complete_json` (round-trip logic +
+  tests in `server/tools.py` / `tests/test_tools.py`). Requires a tool-calling server (`--jinja`).
+- Stored at `MEMORY_PATH` (set it to a real path like `~/.offgrid/agent_memory.json`, not `/tmp`).
+Over time the model resolves nicknames, applies your preferences to conflicts, and needs fewer
+clarifications — the "grows with you" behavior.
+## Where it runs
+The Hermes brain + memory live wherever the autonomous backend runs — the **Mac daemon**
+(`scripts/setup_mac.sh`), an Android phone via Termux (`INFERENCE_BASE_URL` → a local `llama-server`),
+or a cloud box (if you capture from email/Slack/Telegram instead of iMessage).

docs/on-device.md ADDED Viewed

	@@ -0,0 +1,54 @@

+# Running on a cell phone (on-device or thin-client)
+"Runs on a cell phone" can mean two things; the app supports both via one env switch.
+## The inference switch
+`server/model.py` reads `INFERENCE_BASE_URL`:
+- **Unset (default):** the GGUF is loaded in-process via `llama-cpp-python` (the Space / a laptop).
+- **Set:** generation is delegated to a remote **OpenAI-compatible / llama.cpp server** at that URL.
+  Same agent code, different inference location.
+```bash
+export INFERENCE_BASE_URL="http://127.0.0.1:8080/v1"   # a llama-server on the phone
+export INFERENCE_API_KEY="..."                          # optional
+export INFERENCE_MODEL="gemma-e4b"                       # optional label
+```
+So "on the phone" = run a `llama-server` **on the device** and point the agent at `127.0.0.1`.
+## On-device model profile (Gemma E4B edge)
+A 31B Q4 GGUF (~18–20 GB) needs a GPU and will not run on a phone. Use the lightweight **Gemma E4B**
+edge variant (see [PLAN.md](../PLAN.md) and the README *Accuracy upgrade* section), with a small
+context window:
+```bash
+export MODEL_REPO="<your-or-community gemma E4B GGUF repo>"
+export MODEL_FILE="<gemma-e4b-*-Q4_K_M.gguf>"
+export N_CTX=4096           # keep the KV cache small on a phone
+export N_GPU_LAYERS=0       # CPU; on a Mac use Metal layers instead
+```
+## Android (Termux) — genuinely on-device
+```bash
+pkg install python git cmake clang
+git clone <this repo> && cd imessage-calendar-agent
+pip install -r requirements-ci.txt llama-cpp-python   # CPU build
+# Option 1: run the whole app (UI + /agent) on the phone
+USE_STUB_EXTRACTOR=0 python app.py                    # http://127.0.0.1:7860
+# Option 2: run only a llama-server and point a client/app at it
+#   llama-server -m <gemma-e4b.gguf> --port 8080
+#   then set INFERENCE_BASE_URL=http://127.0.0.1:8080/v1
+```
+Expect multi-second latency per request on phone CPU — keep `N_CTX` small and threads short.
+## iOS — the honest limit
+iOS does **not** allow background message access or a persistent background LLM server. You cannot
+run an autonomous on-device agent for iMessage on an iPhone. The supported iOS path is the
+foreground **Shortcut** in [automations.md](./automations.md), optionally pointing at a remote
+`INFERENCE_BASE_URL` for the model.

requirements-ci.txt ADDED Viewed

	@@ -0,0 +1,16 @@

+# Minimal deps for CI / local testing. The app never imports llama_cpp or the
+# Google libs at module load (both are lazy), and tests run in stub mode
+# (USE_STUB_EXTRACTOR=1), so we deliberately exclude:
+#   - llama-cpp-python  (slow source build on CI; real inference is tested on the Space)
+#   - google-api-python-client / google-auth-*  (only used by the optional GCal push)
+gradio>=6.0
+pandas>=2.0
+fastapi>=0.115
+uvicorn[standard]>=0.30
+pydantic>=2.7
+python-dotenv>=1.0
+requests>=2.32
+huggingface_hub>=0.24
+icalendar>=5.0
+python-dateutil>=2.9
+pytest>=8.0

requirements-docker.txt ADDED Viewed

	@@ -0,0 +1,24 @@

+# Runtime deps for the dedicated-GPU Docker Space.
+# Excludes: llama-cpp-python (compiled WITH CUDA in the Dockerfile), the cu124
+# prebuilt index + nvidia-*-cu12 libs (the CUDA devel base provides the toolkit),
+# and `spaces` (ZeroGPU only — its absence makes server/model.py's gpu decorator a
+# no-op so llama.cpp runs directly on the always-attached dedicated GPU).
+# [mcp] extra exposes named Gradio API endpoints as Model Context Protocol tools
+# — same MCP surface as the Gradio-SDK Space (server/mcp_tools.py, app.py).
+gradio[mcp]>=6.0
+pandas>=2.0
+fastapi>=0.115
+uvicorn[standard]>=0.30
+pydantic>=2.7
+python-dotenv>=1.0
+huggingface_hub>=0.24
+requests>=2.32
+icalendar>=5.0
+python-dateutil>=2.9
+pillow-heif>=0.16
+google-api-python-client>=2.130
+google-auth-oauthlib>=1.2
+google-auth-httplib2>=0.2
+# Agent tab: smolagents drives this Space's own MCP tools with a MiniCPM
+# planner (lazy-imported; the stub/scripted path never touches it).
+smolagents[mcp,openai]==1.26.0

requirements.txt ADDED Viewed

	@@ -0,0 +1,47 @@

+# --- Space runtime (Gradio SDK + ZeroGPU) ---
+# [mcp] extra exposes named Gradio API endpoints as Model Context Protocol tools
+# so any MCP-aware agent (Claude Desktop, Cursor, etc.) can call this Space's
+# extract_events / make_ics / check_conflicts — see server/mcp_tools.py + app.py.
+gradio[mcp]>=6.0
+pandas>=2.0          # used directly by the Activity dashboard chart
+fastapi>=0.115
+uvicorn[standard]>=0.30
+pydantic>=2.7
+python-dotenv>=1.0
+huggingface_hub>=0.24
+requests>=2.32
+spaces>=0.30         # ZeroGPU: @spaces.GPU dynamic GPU allocation
+# --- llama.cpp inference (Llama Champion), GPU build ---
+# CUDA prebuilt wheel so layers offload to the ZeroGPU GPU (n_gpu_layers=-1).
+# NOTE: the RTX Pro 6000 Blackwell is sm_120 — if the cu124 wheel lacks Blackwell
+# kernels, build from source against CUDA 12.8:
+#   CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=120" pip install llama-cpp-python
+--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
+# Pin to the newest version with a prebuilt cu124 cp310 wheel. With >= pip grabs a
+# newer PyPI *sdist* and compiles from source (slow, and mismatches the base's CUDA).
+llama-cpp-python==0.3.19
+# CUDA userspace libs the prebuilt wheel dlopens (ZeroGPU env lacks libcudart.so.12).
+# server/model.py::_preload_cuda_libs loads these RTLD_GLOBAL before importing llama_cpp.
+nvidia-cuda-runtime-cu12; platform_system == "Linux"
+nvidia-cublas-cu12; platform_system == "Linux"
+nvidia-cuda-nvrtc-cu12; platform_system == "Linux"
+# --- calendar output ---
+icalendar>=5.0
+python-dateutil>=2.9
+# --- vision input: transcode iPhone HEIC attachments to JPEG ---
+pillow-heif>=0.16
+# --- optional Google Calendar bonus ---
+google-api-python-client>=2.130
+google-auth-oauthlib>=1.2
+google-auth-httplib2>=0.2
+# --- Agent tab: smolagents drives the Space's own MCP tools with a MiniCPM
+# planner (lazy-imported; stub/scripted path and CI never touch it). ---
+smolagents[mcp,openai]==1.26.0
+# NOTE: training deps (unsloth, trl, transformers, bitsandbytes) live in
+# training/requirements-train.txt — they are NOT installed in the Space.

scripts/setup_mac.sh ADDED Viewed

	@@ -0,0 +1,60 @@

+#!/usr/bin/env bash
+# One-time setup for the Mac always-on daemon (Scenario 1):
+#   Hermes (llama-server) + backend (autonomous) + collector, as launchd jobs.
+#
+# Prereqs you provide:
+#   - llama.cpp built (llama-server on PATH, or pass LLAMA_SERVER=/path/to/llama-server)
+#   - a Hermes GGUF (e.g. Hermes-3-Llama-3.1-8B Q4_K_M) -> pass MODEL_GGUF=/path
+#   - Google OAuth: credentials.json (+ token.json after first auth) in the repo dir
+#   - Full Disk Access for the python binary (the script prints how)
+#
+# Usage:
+#   INGEST_TOKEN=... MODEL_GGUF=~/models/hermes-3-8b-q4.gguf ./scripts/setup_mac.sh
+set -euo pipefail
+REPO="$(cd "$(dirname "$0")/.." && pwd)"
+HOME_DIR="$HOME"
+PYTHON="${PYTHON:-$(command -v python3)}"
+LLAMA_SERVER="${LLAMA_SERVER:-$(command -v llama-server || true)}"
+MODEL_GGUF="${MODEL_GGUF:?set MODEL_GGUF=/path/to/hermes.gguf}"
+INGEST_TOKEN="${INGEST_TOKEN:?set INGEST_TOKEN=... (same value you use elsewhere)}"
+LA="$HOME_DIR/Library/LaunchAgents"
+[ -n "$LLAMA_SERVER" ] || { echo "llama-server not found; set LLAMA_SERVER=/path"; exit 1; }
+mkdir -p "$LA" "$HOME_DIR/.offgrid" "$HOME_DIR/Library/Logs"
+install_plist() {
+  local name="$1"
+  sed -e "s|__PYTHON__|$PYTHON|g" \
+      -e "s|__REPO__|$REPO|g" \
+      -e "s|__HOME__|$HOME_DIR|g" \
+      -e "s|__LLAMA_SERVER__|$LLAMA_SERVER|g" \
+      -e "s|__MODEL_GGUF__|$MODEL_GGUF|g" \
+      -e "s|__INGEST_TOKEN__|$INGEST_TOKEN|g" \
+      "$REPO/deploy/launchd/$name" > "$LA/$name"
+  launchctl unload "$LA/$name" 2>/dev/null || true
+  launchctl load "$LA/$name"
+  echo "loaded $name"
+}
+"$PYTHON" -m pip install -q -r "$REPO/requirements-ci.txt"  # runtime deps (no GPU model needed on Mac)
+install_plist com.offgrid.hermes.plist
+install_plist com.offgrid.backend.plist
+install_plist com.offgrid.collector.plist
+cat <<EOF
+Done. Three launchd jobs are running (and restart on reboot):
+  com.offgrid.hermes     -> llama-server (Hermes) on :8080
+  com.offgrid.backend    -> Gradio UI + /agent + /ingest on :7860 (AUTONOMOUS, Hermes brain)
+  com.offgrid.collector  -> reads chat.db -> /ingest
+ONE MANUAL STEP: grant Full Disk Access to the python binary so the collector can read chat.db:
+  System Settings > Privacy & Security > Full Disk Access > +  ->  $PYTHON
+Then: launchctl kickstart -k gui/\$(id -u)/com.offgrid.collector
+Dashboard: http://127.0.0.1:7860  (Activity = live runs, Memory = what it learned)
+Logs:      ~/Library/Logs/offgrid-*.log
+Triggers on YOUR sent/accepted iMessages (TRIGGER_ON=outgoing). Set TRIGGER_ON=any to widen.
+EOF

scripts/start_space.sh ADDED Viewed

	@@ -0,0 +1,85 @@

+#!/usr/bin/env bash
+# Launch the official llama.cpp server + the agent app (Docker GPU Space).
+# llama-server downloads the GGUF from HF on first run and serves it on :8080;
+# the app calls it via INFERENCE_BASE_URL=http://127.0.0.1:8080/v1.
+set -u
+# UI-only / preview mode: in stub mode there's no model, so skip llama-server
+# entirely (otherwise it would download the ~20GB GGUF and fail on a CPU box).
+# Lets the Space run the full UI for free on cpu-basic. See PLAN / docs.
+if [ "${USE_STUB_EXTRACTOR:-0}" = "1" ]; then
+  echo "[start] UI-only (USE_STUB_EXTRACTOR=1) — skipping llama-server"
+  exec python3 app.py
+fi
+LS="$(command -v llama-server || echo /app/llama-server)"
+# The official binary's sibling .so (libllama-server-impl.so) lives next to it in
+# /app; we run from /srv, so add its dir to the loader path.
+export LD_LIBRARY_PATH="$(dirname "$LS"):/app:${LD_LIBRARY_PATH:-}"
+echo "[start] using llama-server at: $LS (LD_LIBRARY_PATH=$LD_LIBRARY_PATH)"
+# Model selection: MODEL_FILE (explicit filename in MODEL_HF_REPO) is preferred —
+# the repo holds multiple Q4_K_M GGUFs (31B + E4B edge), so the `-hf repo:quant`
+# shorthand is ambiguous there. Falls back to -hf REPO:QUANT when MODEL_FILE unset.
+if [ -n "${MODEL_FILE:-}" ]; then
+  echo "[start] model: ${MODEL_HF_REPO}/${MODEL_FILE} (explicit file; downloads on first run)"
+  MODEL_PATH="$(python3 -c "from huggingface_hub import hf_hub_download; print(hf_hub_download('${MODEL_HF_REPO}', '${MODEL_FILE}'))")"
+  MODEL_ARGS="-m $MODEL_PATH"
+else
+  echo "[start] model: ${MODEL_HF_REPO}:${MODEL_QUANT:-Q4_K_M} (downloads on first run)"
+  MODEL_ARGS="-hf ${MODEL_HF_REPO}:${MODEL_QUANT:-Q4_K_M}"
+fi
+# Vision: download the mmproj projector and pass --mmproj so llama-server accepts
+# image_url inputs (screenshots/flyers). MMPROJ_REPO lets the projector come from a
+# different repo than the LLM (the E4B edge model uses the base E4B's projector,
+# not the 31B mmproj stored alongside it). Falls back to text-only if unavailable.
+MMPROJ_ARG=""
+if [ -n "${MMPROJ_FILE:-}" ]; then
+  MMPROJ_REPO="${MMPROJ_REPO:-$MODEL_HF_REPO}"
+  echo "[start] fetching mmproj ${MMPROJ_REPO}/${MMPROJ_FILE} for vision..."
+  MMPROJ_PATH="$(python3 -c "from huggingface_hub import hf_hub_download; print(hf_hub_download('${MMPROJ_REPO}', '${MMPROJ_FILE}'))" 2>/dev/null || true)"
+  if [ -n "$MMPROJ_PATH" ]; then
+    MMPROJ_ARG="--mmproj $MMPROJ_PATH"
+    echo "[start] mmproj ready: $MMPROJ_PATH"
+  else
+    echo "[start] mmproj download failed -> text-only"
+  fi
+fi
+# -ngl 999 offloads all layers to the GPU; --jinja enables the chat/tool template.
+"$LS" $MODEL_ARGS \
+      --host 127.0.0.1 --port 8080 \
+      -ngl 999 -c 8192 --jinja $MMPROJ_ARG &
+LLAMA_PID=$!
+# Optional second llama-server: the Agent tab's MiniCPM planner. OFF unless
+# PLANNER_HF_REPO+PLANNER_FILE are set. VRAM note: E4B Q4 (~5GB) + MiniCPM-8B
+# Q4 (~5GB) + KV is tight on a 16GB T4 — tune PLANNER_NGL (default 999; lower
+# it for partial offload, planner outputs are short) or use the 1B variant
+# (openbmb/MiniCPM5-1B-GGUF / MiniCPM5-1B-Q4_K_M.gguf).
+# PLANNER_CTX (default 8192, matching the main model): a multi-step agent run
+# accumulates the tool schemas + task + thread + each step's observations, so
+# 4096 overflows on real threads ("request (4142 tokens) exceeds context").
+if [ -n "${PLANNER_HF_REPO:-}" ] && [ -n "${PLANNER_FILE:-}" ]; then
+  echo "[start] planner: ${PLANNER_HF_REPO}/${PLANNER_FILE} on :${PLANNER_PORT:-8081}"
+  PLANNER_PATH="$(python3 -c "from huggingface_hub import hf_hub_download; print(hf_hub_download('${PLANNER_HF_REPO}', '${PLANNER_FILE}'))")"
+  "$LS" -m "$PLANNER_PATH" \
+        --host 127.0.0.1 --port "${PLANNER_PORT:-8081}" \
+        -ngl "${PLANNER_NGL:-999}" -c "${PLANNER_CTX:-8192}" --jinja &
+  echo "[start] planner launching (PLANNER_BASE_URL should be http://127.0.0.1:${PLANNER_PORT:-8081}/v1)"
+fi
+echo "[start] waiting for llama-server health (model download can take minutes)..."
+for i in $(seq 1 900); do
+  if ! kill -0 "$LLAMA_PID" 2>/dev/null; then
+    echo "[start] ERROR: llama-server exited early"; break
+  fi
+  if curl -sf http://127.0.0.1:8080/health >/dev/null 2>&1; then
+    echo "[start] llama-server ready after ~$((i*2))s"; break
+  fi
+  sleep 2
+done
+echo "[start] launching app (UI + /agent) -> INFERENCE_BASE_URL=$INFERENCE_BASE_URL"
+exec python3 app.py

scripts/verify_gcal_e2e.py ADDED Viewed

	@@ -0,0 +1,159 @@

+"""End-to-end Google Calendar verification: agent-extracted event -> real push
+-> API readback -> cleanup. Manual-run only (needs a real per-user token and
+the google libs); never imported by CI.
+One-time token bootstrap: connect in the app (Step 2a), then DevTools ->
+Application -> Local Storage -> copy the `gcal_token` value into a file.
+    python scripts/verify_gcal_e2e.py --token-file tok.json [--check-only] [--keep]
+"""
+from __future__ import annotations
+import argparse
+import json
+import os
+import sys
+import uuid
+from datetime import datetime
+from pathlib import Path
+os.environ.setdefault("USE_STUB_EXTRACTOR", "1")
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+from dateutil import parser as dtparser  # noqa: E402
+from calendar_out import gcal  # noqa: E402
+# The canonical appointment-confirmation sample (kept in sync with
+# tests/test_agent.py::CANON — copied because CI test modules aren't a package).
+CANON = (
+    "Thank you for scheduling your appointment with Primary Care of Manhattan. "
+    "We look forward to seeing you!\n"
+    "\n"
+    "Date: Monday, June 22, 2026\n"
+    "Time: 10:30 AM\n"
+    "Duration: Approx. 30–45 min\n"
+    "(Please arrive 15 minutes early to complete intake forms)\n"
+    "\n"
+    "\U0001f4cd 112A West 72nd Street\n"
+    "New York, NY 10023\n"
+    "(Upper West Side — 72nd & Columbus)\n"
+)
+EXPECT_START = "2026-06-22T10:15:00"
+EXPECT_LOCATION = "112A West 72nd Street, New York, NY 10023"
+EXPECT_REMINDER = 60
+_results: list[tuple[bool, str]] = []
+def _check(ok: bool, label: str) -> bool:
+    _results.append((ok, label))
+    print(("PASS  " if ok else "FAIL  ") + label)
+    return ok
+def _bare_creds(token_json: str):
+    """Access-token-only Credentials: works for API calls while the token is
+    fresh, but cannot refresh."""
+    from google.oauth2.credentials import Credentials
+    return Credentials(token=json.loads(token_json).get("token"))
+def _enable_bare_token_fallback(token: str) -> None:
+    """Space-minted tokens may lack client_secret (it exists only in the
+    Space's env, and GOOGLE_OAUTH_CLIENT_SECRET isn't set locally). google-auth
+    then refuses to build refresh-capable creds — fall back to using the bare
+    access token directly (valid ~1h after minting)."""
+    try:
+        gcal._creds_from_token_json(token)
+    except ValueError as e:
+        if "client_secret" not in str(e):
+            raise
+        print("note: token has no client_secret and none in env -> using the "
+              "access token directly (no refresh; re-mint if it expires)")
+        gcal._creds_from_token_json = _bare_creds
+def main() -> int:
+    ap = argparse.ArgumentParser(description=__doc__)
+    ap.add_argument("--token-file", default=os.environ.get("GCAL_TOKEN_FILE", ""))
+    ap.add_argument("--check-only", action="store_true",
+                    help="only liveness-check the token; no event is created")
+    ap.add_argument("--keep", action="store_true",
+                    help="leave the test event in the calendar (default: delete)")
+    ap.add_argument("--calendar-id", default="primary")
+    args = ap.parse_args()
+    if not args.token_file or not Path(args.token_file).exists():
+        print("ERROR: pass --token-file (or set GCAL_TOKEN_FILE) pointing at the "
+              "localStorage gcal_token JSON")
+        return 1
+    token = Path(args.token_file).read_text(encoding="utf-8").strip()
+    _enable_bare_token_fallback(token)
+    # 1. liveness check (same call the /oauth2/check endpoint makes)
+    res = gcal.check_token(token)
+    if not _check(res["ok"], f"check_token: {res if not res['ok'] else 'token is live'}"):
+        return 1
+    if res.get("refreshed_token"):
+        token = res["refreshed_token"]
+        Path(args.token_file).write_text(token, encoding="utf-8")
+        print("      (token was refreshed; token file updated)")
+    if args.check_only:
+        return 0
+    # 2. agent extraction (stub mode = deterministic) + invariants
+    from server.agent import run_agent
+    plan = run_agent(CANON, now=datetime(2026, 6, 12, 9, 0))
+    if not _check(len(plan.events) == 1, f"agent extracted 1 event (got {len(plan.events)})"):
+        return 1
+    ev = plan.events[0]
+    _check(ev.start == EXPECT_START, f"start == {EXPECT_START} (got {ev.start})")
+    _check(ev.location == EXPECT_LOCATION, f"location == {EXPECT_LOCATION!r} (got {ev.location!r})")
+    _check(ev.reminder_minutes == EXPECT_REMINDER,
+           f"reminder == {EXPECT_REMINDER} (got {ev.reminder_minutes})")
+    # 3. push with a nonce title so readback/cleanup can never touch a real event
+    nonce = f"e2e-{uuid.uuid4().hex[:6]}"
+    ev.title = f"{ev.title} [{nonce}]"
+    links = gcal.push_events_with_token(token, [ev], calendar_id=args.calendar_id)
+    _check(bool(links and links[0]), f"push returned an event link: {links[0] if links else '-'}")
+    # 4. read it back through the API and compare what actually landed
+    from googleapiclient.discovery import build
+    creds = gcal._creds_from_token_json(token)
+    svc = build("calendar", "v3", credentials=creds)
+    found = svc.events().list(
+        calendarId=args.calendar_id, q=nonce, singleEvents=True,
+        timeMin="2026-06-21T00:00:00Z", timeMax="2026-06-23T00:00:00Z",
+    ).execute().get("items", [])
+    if _check(len(found) == 1, f"readback found exactly 1 event (got {len(found)})"):
+        got = found[0]
+        _check(nonce in got.get("summary", ""), f"summary carries nonce (got {got.get('summary')!r})")
+        _check(got.get("location") == EXPECT_LOCATION,
+               f"location landed (got {got.get('location')!r})")
+        want = dtparser.isoparse(gcal._dt_field(ev.start)["dateTime"])
+        have = dtparser.isoparse(got["start"]["dateTime"])
+        # compare instants — the API echoes in the calendar's zone
+        _check(want == have or want.replace(tzinfo=None) == have.replace(tzinfo=None),
+               f"start instant matches (sent {want.isoformat()}, got {have.isoformat()})")
+        overrides = (got.get("reminders") or {}).get("overrides") or []
+        _check(any(o.get("minutes") == EXPECT_REMINDER for o in overrides),
+               f"reminder override {EXPECT_REMINDER} min landed (got {overrides})")
+        if not args.keep:
+            svc.events().delete(calendarId=args.calendar_id, eventId=got["id"]).execute()
+            print(f"      (test event {got['id']} deleted)")
+        else:
+            print(f"      (kept: {got.get('htmlLink')})")
+    failures = [label for ok, label in _results if not ok]
+    print(f"\n{'PASS' if not failures else 'FAIL'}: "
+          f"{len(_results) - len(failures)}/{len(_results)} checks passed")
+    return 0 if not failures else 1
+if __name__ == "__main__":
+    raise SystemExit(main())

server/__init__.py ADDED Viewed

File without changes

server/agent.py ADDED Viewed

	@@ -0,0 +1,475 @@

+"""The scheduling agent: thread (+images) -> validated ActionPlan.
+Replaces the old one-shot extractor. The model reasons over a whole conversation
+and emits a single constrained ActionPlan: events, conflicts (vs the user's
+existing calendar), proposed alternative times, a reply draft, and an optional
+clarification question. Output is grammar-constrained so it always parses.
+"""
+from __future__ import annotations
+import json
+import os
+import re
+from datetime import datetime, timedelta
+from typing import Optional
+from dateutil import parser as dtparser
+from pydantic import ValidationError
+from . import events, memory
+from .schema import ActionPlan, Event
+SYSTEM = (
+    "You are a scheduling assistant reading a chat conversation (text, and sometimes images "
+    "such as screenshots, invites, or flyers). Decide what calendar action is warranted and "
+    "return ONLY a JSON object matching the ActionPlan schema:\n"
+    "- reasoning: one or two sentences of why.\n"
+    "- events: concrete events with ISO 8601 datetimes; resolve relative dates from the current "
+    "datetime. Empty if there is no real plan. List EVERY distinct event separately — one thread "
+    "often holds several (e.g. a drop-off AND a pickup, or two appointments, are separate events).\n"
+    "- title: a short, self-contained calendar title summarizing the action and subject "
+    "(e.g. \"Pick up Priya — Terminal 4\", \"Mia — dental cleaning\"), not a quote of the "
+    "message.\n"
+    "- location: the venue or address when one is mentioned (join multi-line addresses into one "
+    "string); null otherwise.\n"
+    "- end: when a duration is stated (\"Duration: 30–45 min\", \"for 2 hours\", \"runs 90 "
+    "minutes\"), set end = start + duration, using the LOWER bound of a range; when an end time "
+    "is stated (\"7-9pm\"), use it; otherwise null. Never guess a duration that was not given.\n"
+    "- early arrival: if told to arrive N minutes early (\"please arrive 15 minutes early\"), "
+    "start = the arrival time (stated time minus N); end still counts from the STATED time; put "
+    "the stated time and the reason in notes.\n"
+    "- reminder_minutes: a stated lead time always wins (\"remind me 2 hours before\" -> 120); "
+    "otherwise 60 for doctor/medical visits, 30 for parties, 45 for carpools or school events; "
+    "for anything else use your judgment.\n"
+    "- conflicts: for any event that clashes with the provided existing calendar, the event_index, "
+    "what it clashes with, and severity (overlap|adjacent|tight).\n"
+    "- proposed_times: ISO 8601 alternatives when there is a conflict.\n"
+    "- reply_draft: a short, natural reply the user could send back.\n"
+    "- needs_clarification: a question if the plan is ambiguous, else null. If something should "
+    "be scheduled but its day or time is not yet known (\"TBD\", \"I'll confirm\", \"sometime "
+    "soon\"), leave events empty and ASK via needs_clarification instead of guessing.\n"
+    "Do not invent events that were not discussed."
+)
+def _existing_block(existing: list[Event]) -> str:
+    if not existing:
+        return "Existing calendar: (none provided)"
+    lines = [f"- {e.title}: {e.start}..{e.end or e.start}" for e in existing]
+    return "Existing calendar:\n" + "\n".join(lines)
+def build_messages(
+    thread: str,
+    now: datetime,
+    existing: list[Event],
+    images: Optional[list[str]] = None,
+    memory_block: Optional[str] = None,
+) -> list[dict]:
+    """Build chat messages. ``images`` are base64 data URIs (used from phase 3).
+    ``memory_block`` is the caller's recall block (per-user/localStorage memory);
+    when None, fall back to the server-side global memory.recall()."""
+    mem = memory.recall() if memory_block is None else memory_block
+    mem_block = f"{mem}\n\n" if mem else ""
+    text = (
+        f"Current datetime: {now.strftime('%A')}, {now.isoformat()}\n"
+        f"{_existing_block(existing)}\n\n"
+        f"{mem_block}"
+        f"Conversation:\n{thread}\n\n"
+        "Return the ActionPlan JSON now."
+    )
+    if not images:
+        return [
+            {"role": "system", "content": SYSTEM},
+            {"role": "user", "content": text},
+        ]
+    # Multimodal content format understood by llama.cpp vision chat handlers.
+    content = [{"type": "text", "text": text}]
+    for uri in images:
+        content.append({"type": "image_url", "image_url": {"url": uri}})
+    return [
+        {"role": "system", "content": SYSTEM},
+        {"role": "user", "content": content},
+    ]
+def run_agent(
+    thread: str,
+    now: Optional[datetime] = None,
+    existing: Optional[list[Event]] = None,
+    images: Optional[list[str]] = None,
+    memory_block: Optional[str] = None,
+) -> ActionPlan:
+    now = now or datetime.now()
+    existing = existing or []
+    with events.run_scope("analyze"):
+        if images:
+            events.emit("vision", f"reading {len(images)} image(s)", images=len(images))
+        if os.environ.get("USE_STUB_EXTRACTOR") == "1":
+            plan = _stub_plan(thread, now)
+        else:
+            from .model import complete_json  # lazy: avoids llama.cpp in stub mode
+            raw = complete_json(
+                build_messages(thread, now, existing, images, memory_block),
+                json_schema=ActionPlan.model_json_schema(),
+            )
+            plan = apply_text_rules(thread, _polish_titles(thread, _parse_plan(raw)))
+        # Global path only: with client-owned (per-user) memory, the UI merges
+        # learned contacts itself (memory.learn_from_plan) so we don't pollute the
+        # shared server file.
+        if memory_block is None:
+            memory.observe_plan(plan)  # grows-with-you: learn recurring contacts
+        events.emit("decision", f"{len(plan.events)} event(s) detected", events=len(plan.events))
+        return plan
+def _parse_plan(raw: str) -> ActionPlan:
+    try:
+        return ActionPlan(**json.loads(raw))
+    except (json.JSONDecodeError, ValidationError):
+        # Grammar should prevent this; degrade to an empty plan rather than 500.
+        return ActionPlan(reasoning="Could not parse model output.")
+# --------------------------------------------------------------------------- #
+# Title polish (optional second pass, TITLE_POLISH=1): rewrite each extracted
+# event's title into a calendar-ready action+subject summary. The extraction
+# pass already gets a title style instruction; this pass gives the model one
+# focused job, which helps on echo-prone inputs (flyers, forwarded notices).
+# --------------------------------------------------------------------------- #
+TITLE_SYSTEM = (
+    "You rewrite calendar event titles. Given a conversation and the events extracted from "
+    "it, return ONLY a JSON object {\"titles\": [...]} with exactly one title per event, in "
+    "the same order. Each title is a short, self-contained calendar entry summarizing the "
+    "action and subject (e.g. \"Pick up Priya — Terminal 4\", \"Mia — dental cleaning\"). "
+    "Keep names and places; drop filler, hype and sender wording. Never add facts that are "
+    "not in the conversation."
+)
+TITLES_SCHEMA = {
+    "type": "object",
+    "properties": {"titles": {"type": "array", "items": {"type": "string"}}},
+    "required": ["titles"],
+}
+def build_title_messages(thread: str, events: list[dict]) -> list[dict]:
+    """Messages for the polish pass. ``events`` are Event-shaped dicts."""
+    lines = [
+        f"{i + 1}. {e.get('title') or '(untitled)'} @ {e.get('start')}"
+        + (f" ({e['location']})" if e.get("location") else "")
+        for i, e in enumerate(events)
+    ]
+    text = (
+        f"Conversation:\n{thread}\n\n"
+        "Extracted events:\n" + "\n".join(lines) + "\n\n"
+        "Return the titles JSON now."
+    )
+    return [
+        {"role": "system", "content": TITLE_SYSTEM},
+        {"role": "user", "content": text},
+    ]
+def merge_titles(plan: ActionPlan, raw: str) -> ActionPlan:
+    """Apply a polish-pass response onto the plan; on any mismatch keep the
+    original titles (the polish pass must never be able to lose an event)."""
+    try:
+        titles = json.loads(raw).get("titles")
+    except (json.JSONDecodeError, AttributeError):
+        return plan
+    if not isinstance(titles, list) or len(titles) != len(plan.events):
+        return plan
+    for ev, title in zip(plan.events, titles):
+        if isinstance(title, str) and title.strip():
+            ev.title = title.strip()[:80]
+    return plan
+def apply_text_rules(thread: str, plan: ActionPlan) -> ActionPlan:
+    """Deterministic guarantees for explicitly-communicated logistics (same
+    philosophy as conflict detection: don't leave must-hold rules to the model).
+    Single-event plans only — multi-event threads keep per-event model judgment.
+    - "arrive N minutes early" -> start = arrival time, but ONLY when the model
+      demonstrably did not shift already (its start equals the stated time).
+    - end = STATED time + stated duration: a self-shifting model often counts
+      the duration from the arrival time (10:15+30=10:45 instead of 11:00).
+    - reminder: an explicit stated lead time always wins; else type defaults
+      (medical 60 / party 30 / carpool-school 45); else the model's judgment.
+    """
+    if len(plan.events) != 1:
+        return plan
+    ev = plan.events[0]
+    early = _EARLY_RE.search(thread)
+    stated = _find_time(thread)
+    if early and stated:
+        try:
+            start_dt = datetime.fromisoformat(ev.start)
+        except ValueError:
+            start_dt = None
+        if start_dt is not None:
+            mins = int(early.group(1))
+            appt_dt = start_dt.replace(hour=stated[0], minute=stated[1])
+            if start_dt == appt_dt:  # model did not shift -> start at arrival
+                start_dt = appt_dt - timedelta(minutes=mins)
+                ev.start = start_dt.isoformat()
+            if start_dt == appt_dt - timedelta(minutes=mins):
+                # The event covers arrival (we or the model shifted it): anchor
+                # the END to the stated time + stated duration, and make sure
+                # the official time survives in the notes.
+                duration = _find_duration_minutes(thread)
+                if duration:
+                    ev.end = (appt_dt + timedelta(minutes=duration)).isoformat()
+                hhmm = appt_dt.strftime("%H:%M")
+                if hhmm not in (ev.notes or ""):
+                    note = f"Appointment at {hhmm}; arrive {mins} min early"
+                    ev.notes = f"{ev.notes} — {note}" if ev.notes else note
+    m = _REMIND_EXPLICIT_RE.search(thread)
+    if m:
+        n = int(m.group(1))
+        ev.reminder_minutes = n * 60 if m.group(2).lower().startswith("h") else n
+    elif _MEDICAL_RE.search(thread):
+        ev.reminder_minutes = 60
+    elif _PARTY_RE.search(thread):
+        ev.reminder_minutes = 30
+    elif _CARPOOL_SCHOOL_RE.search(thread):
+        ev.reminder_minutes = 45
+    return plan
+def _polish_titles(thread: str, plan: ActionPlan) -> ActionPlan:
+    if not plan.events or os.environ.get("TITLE_POLISH") != "1":
+        return plan
+    from .model import complete_json  # lazy: avoids llama.cpp in stub mode
+    try:
+        raw = complete_json(
+            build_title_messages(thread, [e.model_dump() for e in plan.events]),
+            json_schema=TITLES_SCHEMA,
+            max_tokens=256,
+        )
+    except Exception:  # noqa: BLE001  polish is best-effort, never fatal
+        return plan
+    return merge_titles(plan, raw)
+def run_agent_stream(
+    thread: str,
+    now: Optional[datetime] = None,
+    existing: Optional[list[Event]] = None,
+    images: Optional[list[str]] = None,
+    busy=None,
+    memory_block: Optional[str] = None,
+):
+    """Generator for the UI: yields (partial_text, plan_or_None). Streams the
+    model output for a live 'thinking' panel, then yields the final ActionPlan
+    (with deterministic conflicts annotated if ``busy`` intervals are given).
+    ``memory_block`` carries the caller's per-user (localStorage) memory."""
+    now = now or datetime.now()
+    existing = existing or []
+    with events.run_scope("analyze"):
+        if images:
+            events.emit("vision", f"reading {len(images)} image(s)", images=len(images))
+        if os.environ.get("USE_STUB_EXTRACTOR") == "1":
+            plan = _stub_plan(thread, now)
+            text = json.dumps(plan.model_dump(), indent=2)
+            events.emit("model", "stub inference", latency_ms=0)
+            acc = ""
+            for i in range(0, len(text), 24):  # simulate token streaming
+                acc += text[i : i + 24]
+                yield acc, None
+        else:
+            from .model import stream_complete_json
+            acc = ""
+            for delta in stream_complete_json(
+                build_messages(thread, now, existing, images, memory_block),
+                ActionPlan.model_json_schema(),
+            ):
+                acc += delta
+                yield acc, None
+            plan = apply_text_rules(thread, _polish_titles(thread, _parse_plan(acc)))
+        # Global path only (see run_agent): client memory is merged by the UI.
+        if memory_block is None:
+            memory.observe_plan(plan)  # grows-with-you: learn recurring contacts
+        events.emit("decision", f"{len(plan.events)} event(s) detected", events=len(plan.events))
+        if busy:
+            from calendar_out.freebusy import annotate_conflicts  # lazy: avoid cycle
+            plan = annotate_conflicts(plan, busy)
+        yield (json.dumps(plan.model_dump(), indent=2), plan)
+_TIME_RE = re.compile(r"\b(\d{1,2})(?::(\d{2}))?\s*(am|pm)?\b", re.IGNORECASE)
+_TIME_LABEL_RE = re.compile(r"(?im)^\s*time\s*[:\-]\s*(.+)$")
+_MONTH_DATE_RE = re.compile(
+    r"\b(?:jan(?:uary)?|feb(?:ruary)?|mar(?:ch)?|apr(?:il)?|may|jun(?:e)?|jul(?:y)?|"
+    r"aug(?:ust)?|sep(?:t(?:ember)?)?|oct(?:ober)?|nov(?:ember)?|dec(?:ember)?)\.?\s+"
+    r"\d{1,2}(?:st|nd|rd|th)?(?:,?\s*\d{4})?\b", re.IGNORECASE)
+_WEEKDAY_RE = re.compile(
+    r"\b(monday|tuesday|wednesday|thursday|friday|saturday|sunday)\b", re.IGNORECASE)
+_LOCATION_RE = re.compile(
+    r"(?i)^\s*(?:(?:location|where|address)\s*[:\-]|\U0001F4CD)\s*(.*)$")
+_LABEL_LINE_RE = re.compile(r"^\s*[A-Za-z][A-Za-z ]{0,20}:\s")  # "Time: ...", "Notes: ..."
+_DURATION_RE = re.compile(r"(?im)^\s*duration\s*[:\-]\s*(.*)$")
+_EARLY_RE = re.compile(r"(?i)arrive\s+(\d{1,3})\s*min(?:ute)?s?\s+early")
+_REMIND_EXPLICIT_RE = re.compile(
+    r"(?i)\b(?:remind(?:er)?|notify|alert)\s*(?:me\s+)?(?:for\s+)?"
+    r"(\d{1,3})\s*(min(?:ute)?s?|h(?:ou)?rs?)\s*(?:before|ahead|prior|early)")
+_MEDICAL_RE = re.compile(
+    r"(?i)\b(?:doctor|dr\b\.?|clinic|dentist|dental|pediatric\w*|physician|"
+    r"medical|check-?up|primary\s+care|intake\s+forms?)")
+_PARTY_RE = re.compile(  # "party of 4" is a group size, not a party
+    r"(?i)\b(?:birthday|bday)\b|\bparty\b(?!\s+of\s+\d)")
+_CARPOOL_SCHOOL_RE = re.compile(r"(?i)\bcarpool\w*\b|\bschool\b|drive\s+the\s+kids")
+def _find_time(thread: str) -> Optional[tuple[int, int]]:
+    """First plausible clock time, or None. A bare integer ("June 22", "112A")
+    is not a time — a match needs a minute component or an am/pm marker."""
+    label = _TIME_LABEL_RE.search(thread)
+    scope = label.group(1) if label else thread
+    for m in _TIME_RE.finditer(scope):
+        if not (m.group(2) or m.group(3)):
+            continue
+        hour, minute = int(m.group(1)), int(m.group(2) or 0)
+        if hour > 23 or minute > 59:
+            continue
+        mer = (m.group(3) or "").lower()
+        if mer == "pm" and hour < 12:
+            hour += 12
+        elif mer == "am" and hour == 12:
+            hour = 0
+        return hour, minute
+    return None
+def _find_date(thread: str, now: datetime):
+    """Resolve the event's day: explicit date > today/tomorrow > weekday > tomorrow."""
+    m = _MONTH_DATE_RE.search(thread)
+    if m:
+        try:
+            return dtparser.parse(m.group(0), default=now).date()
+        except (ValueError, OverflowError):
+            pass
+    if re.search(r"\btomorrow\b", thread, re.IGNORECASE):
+        return (now + timedelta(days=1)).date()
+    if re.search(r"\btoday\b|\btonight\b", thread, re.IGNORECASE):
+        return now.date()
+    m = _WEEKDAY_RE.search(thread)
+    if m:
+        try:
+            return dtparser.parse(m.group(1), default=now).date()  # next-or-same day
+        except (ValueError, OverflowError):
+            pass
+    return (now + timedelta(days=1)).date()
+def _find_location(lines: list[str]) -> tuple[Optional[str], set[int]]:
+    """A "Location:" line plus continuation lines (a wrapped address) until a
+    blank line or the next "Label:" line. Returns (joined location, line idxs)."""
+    for i, line in enumerate(lines):
+        m = _LOCATION_RE.match(line)
+        if not m:
+            continue
+        parts, used = [m.group(1).strip()], {i}
+        for j in range(i + 1, len(lines)):
+            nxt = lines[j].strip()
+            if not nxt or nxt.startswith("(") or _LABEL_LINE_RE.match(lines[j]):
+                break
+            parts.append(nxt)
+            used.add(j)
+        loc = ", ".join(p for p in parts if p)
+        return (loc or None), used
+    return None, set()
+def _find_duration_minutes(thread: str) -> Optional[int]:
+    m = _DURATION_RE.search(thread)
+    if m:
+        num = re.search(r"\d+", m.group(1))
+        if num:
+            return int(num.group(0))
+    return None
+def _reminder_minutes(thread: str) -> int:
+    """Notification lead time: an explicit ask wins, else event-type defaults
+    (medical 60, party 30, carpool/school 45 — checked in that order), else 30."""
+    m = _REMIND_EXPLICIT_RE.search(thread)
+    if m:
+        n = int(m.group(1))
+        return n * 60 if m.group(2).lower().startswith("h") else n
+    if _MEDICAL_RE.search(thread):
+        return 60
+    if _PARTY_RE.search(thread):
+        return 30
+    if _CARPOOL_SCHOOL_RE.search(thread):
+        return 45
+    return 30
+def _is_date_line(line: str, now: datetime) -> bool:
+    try:
+        dtparser.parse(line, default=now)  # non-fuzzy: chatter raises ParserError
+        return True
+    except (ValueError, OverflowError):
+        return False
+def _pick_title(lines: list[str], now: datetime, location_idx: set[int]) -> str:
+    nonempty = [(i, ln.strip()) for i, ln in enumerate(lines) if ln.strip()]
+    if not nonempty:
+        return "Event"
+    first_i, first = nonempty[0]
+    if not _is_date_line(first, now):
+        return first[:60]
+    # First line is just the date — find a more descriptive line instead.
+    for i, ln in nonempty[1:]:
+        if i in location_idx or _LABEL_LINE_RE.match(ln) or ln.startswith("("):
+            continue
+        if _is_date_line(ln, now):
+            continue
+        return ln[:60]
+    return "Appointment"
+def _stub_plan(thread: str, now: datetime) -> ActionPlan:
+    """Heuristic ActionPlan so phases without a model still demo end to end."""
+    time_found = _find_time(thread)
+    if not time_found:
+        return ActionPlan(reasoning="No time found.", reply_draft="Got it, thanks!")
+    hour, minute = time_found
+    lines = thread.strip().splitlines()
+    location, loc_idx = _find_location(lines)
+    day = _find_date(thread, now)
+    appt = now.replace(year=day.year, month=day.month, day=day.day,
+                       hour=hour, minute=minute, second=0, microsecond=0)
+    duration = _find_duration_minutes(thread) or 60
+    # "Arrive N minutes early" -> start at the ARRIVAL time; the end (and the
+    # notes) stay anchored to the stated appointment time.
+    early = _EARLY_RE.search(thread)
+    start = appt - timedelta(minutes=int(early.group(1))) if early else appt
+    notes = (f"Appointment at {appt.strftime('%H:%M')}; arrive {early.group(1)} min early"
+             if early else "(stub agent — wire the model to replace this)")
+    return ActionPlan(
+        reasoning="(stub) parsed time/date/location heuristically.",
+        events=[
+            Event(
+                title=_pick_title(lines, now, loc_idx),
+                start=start.isoformat(),
+                end=(appt + timedelta(minutes=duration)).isoformat(),
+                location=location,
+                reminder_minutes=_reminder_minutes(thread),
+                notes=notes,
+            )
+        ],
+        reply_draft="Sounds good, see you then!",
+    )

server/dedup.py ADDED Viewed

	@@ -0,0 +1,84 @@

+"""Idempotency for autonomous mode: don't create the same event twice.
+As more messages stream into a chat, re-running the agent over a rolling window
+re-surfaces events already captured. ``filter_new`` returns only events not seen
+before, keyed by normalized title + minute-rounded start. Durable JSON store
+mirrors ``server/impact.py`` (env path + lock; no DB — local-first).
+"""
+from __future__ import annotations
+import json
+import os
+import threading
+from pathlib import Path
+from dateutil import parser as dtparser
+from .schema import Event
+_lock = threading.Lock()
+def _path() -> Path:
+    return Path(os.environ.get("DEDUP_PATH", "/tmp/agent_seen.json"))
+def event_key(ev: Event) -> str:
+    """Normalized identity: lowercased title + start rounded to the minute.
+    Conservative by design — if the model rewords a title between messages we may
+    miss a dedup (a duplicate event), which is safer than dropping a real event.
+    """
+    title = (ev.title or "").strip().lower()
+    try:
+        start = dtparser.isoparse(ev.start).replace(second=0, microsecond=0).isoformat()
+    except (ValueError, TypeError):
+        start = (ev.start or "").strip()
+    return f"{title}|{start}"
+def _load() -> list[str]:
+    try:
+        return json.loads(_path().read_text())
+    except Exception:  # noqa: BLE001  missing/corrupt -> start fresh
+        return []
+def filter_new(events: list[Event], record: bool = True) -> list[Event]:
+    """Return only events whose key hasn't been recorded.
+    ``record=False`` filters WITHOUT persisting — callers delivering the events
+    somewhere fallible (e.g. a calendar push) should filter first and
+    ``mark_seen`` only after delivery succeeds; otherwise a transient failure
+    permanently swallows the events ("seen" but never delivered)."""
+    with _lock:
+        seen = set(_load())
+        fresh = []
+        for ev in events:
+            k = event_key(ev)
+            if k in seen:
+                continue
+            seen.add(k)
+            fresh.append(ev)
+        if fresh and record:
+            _path().write_text(json.dumps(sorted(seen), indent=2))
+        return fresh
+def mark_seen(events: list[Event]) -> None:
+    """Persist the keys of successfully delivered events."""
+    if not events:
+        return
+    with _lock:
+        seen = set(_load())
+        seen.update(event_key(ev) for ev in events)
+        _path().write_text(json.dumps(sorted(seen), indent=2))
+def reset() -> None:
+    """Drop the seen-set (test helper)."""
+    with _lock:
+        try:
+            _path().unlink()
+        except FileNotFoundError:
+            pass

server/events.py ADDED Viewed

	@@ -0,0 +1,116 @@

+"""In-process activity bus: every pipeline stage emits structured events here so
+the Activity dashboard can show what the LLM and agent are doing in real time.
+A thread-safe ring buffer holds recent events. A contextvar (run_scope) tags all
+events emitted during one agent run with the same run id, so the dashboard can
+group them into per-run traces.
+"""
+from __future__ import annotations
+import threading
+from collections import deque
+from contextlib import contextmanager
+from contextvars import ContextVar
+from datetime import datetime
+from itertools import count
+MAXLEN = 800
+# Stages of the pipeline, in display order (used by the stepper + chart).
+STAGES = ["ingest", "vision", "model", "decision", "conflict", "calendar"]
+_BUF: deque[dict] = deque(maxlen=MAXLEN)
+_lock = threading.Lock()
+_run_var: ContextVar = ContextVar("agent_run", default=None)
+_seq = count(1)
+def _now() -> str:
+    return datetime.now().isoformat(timespec="seconds")
+def emit(stage: str, message: str, level: str = "info", **payload) -> dict:
+    """Record one activity event. ``payload`` may carry latency_ms, events,
+    conflicts, images, tokens, etc. Returns the event dict."""
+    ev = {
+        "id": next(_seq),
+        "ts": _now(),
+        "stage": stage,
+        "level": level,
+        "message": message,
+        "run": _run_var.get(),
+        **payload,
+    }
+    with _lock:
+        _BUF.append(ev)
+    return ev
+@contextmanager
+def run_scope(label: str = ""):
+    """Tag every event emitted inside the block with a shared run id."""
+    run_id = f"{next(_seq)}:{label}" if label else str(next(_seq))
+    token = _run_var.set(run_id)
+    try:
+        yield run_id
+    finally:
+        # Best-effort: when used inside a streaming generator that the server drives
+        # across different contexts (e.g. Gradio's queue), reset(token) raises
+        # "Token was created in a different Context". Clearing is enough either way.
+        try:
+            _run_var.reset(token)
+        except ValueError:
+            _run_var.set(None)
+def recent(n: int = 120) -> list[dict]:
+    with _lock:
+        return list(_BUF)[-n:][::-1]  # newest first
+def current_stage() -> str | None:
+    with _lock:
+        return _BUF[-1]["stage"] if _BUF else None
+def metrics() -> dict:
+    with _lock:
+        evs = list(_BUF)
+    lat = [e["latency_ms"] for e in evs if e.get("latency_ms")]
+    return {
+        "messages": sum(1 for e in evs if e["stage"] == "ingest"),
+        "events_created": sum(e.get("events", 0) for e in evs if e["stage"] == "decision"),
+        "conflicts": sum(e.get("conflicts", 0) for e in evs if e["stage"] == "conflict"),
+        "images_read": sum(e.get("images", 0) for e in evs),
+        "model_calls": len(lat),
+        "avg_latency_ms": round(sum(lat) / len(lat)) if lat else 0,
+        "errors": sum(1 for e in evs if e["level"] == "error"),
+    }
+def stage_counts() -> list[dict]:
+    """Counts per stage, ready for gr.BarPlot."""
+    with _lock:
+        evs = list(_BUF)
+    counts = {s: 0 for s in STAGES}
+    for e in evs:
+        if e["stage"] in counts:
+            counts[e["stage"]] += 1
+    return [{"stage": s, "count": counts[s]} for s in STAGES]
+def recent_runs(n: int = 8) -> list[tuple[str, list[dict]]]:
+    """Group recent events by run id (newest run first)."""
+    with _lock:
+        evs = list(_BUF)
+    groups: dict[str, list[dict]] = {}
+    order: list[str] = []
+    for e in evs:
+        rid = e.get("run")
+        if not rid:
+            continue
+        if rid not in groups:
+            groups[rid] = []
+            order.append(rid)
+        groups[rid].append(e)
+    return [(rid, groups[rid]) for rid in order[-n:][::-1]]

server/health.py ADDED Viewed

	@@ -0,0 +1,54 @@

+"""Liveness + a hardware-adequacy signal for the UI banner and external monitors.
+`health_status()` powers both `GET /health` and the on-page status banner. It
+reports `degraded: true` when the *real* model would run on CPU-only hardware
+(where extraction can be slow or time out). `FORCE_DEGRADED=1` forces it on so
+the banner can be exercised without actually changing the Space's hardware.
+"""
+from __future__ import annotations
+import os
+def gpu_available() -> bool:
+    """Best-effort GPU probe. This app serves via llama.cpp (no torch), so we
+    can't rely on torch.cuda — check for an NVIDIA device or `nvidia-smi`."""
+    import glob
+    import shutil
+    import subprocess
+    if glob.glob("/dev/nvidia[0-9]*"):
+        return True
+    if shutil.which("nvidia-smi"):
+        try:
+            return subprocess.run(["nvidia-smi"], capture_output=True, timeout=5).returncode == 0
+        except Exception:  # noqa: BLE001  any probe failure -> treat as no GPU
+            return False
+    return False
+def health_status() -> dict:
+    """Return liveness + the degraded/device/model signal (read at call time)."""
+    if os.environ.get("FORCE_DEGRADED") == "1":
+        return {
+            "ok": True, "device": "cpu", "model": "real", "degraded": True,
+            "reason": "Running the model on CPU-only hardware. Extraction may be slow "
+                      "or time out. Upgrade to a GPU.",
+        }
+    # Stub mode is a deliberate, fast, free preview — not a degraded state.
+    if os.environ.get("USE_STUB_EXTRACTOR") == "1":
+        return {"ok": True, "device": "cpu", "model": "stub", "degraded": False, "reason": ""}
+    # Real model. On the Space, INFERENCE_BASE_URL points at the *local* llama-server,
+    # so local inference still depends on this host having a GPU. A non-localhost URL
+    # means inference runs elsewhere, so this host's hardware is irrelevant.
+    base = os.environ.get("INFERENCE_BASE_URL", "")
+    local = (not base) or ("127.0.0.1" in base) or ("localhost" in base)
+    gpu = gpu_available()
+    degraded = local and not gpu
+    device = "cuda" if (local and gpu) else ("cpu" if local else "remote")
+    reason = (
+        "Running the model on CPU-only hardware. Extraction may be slow or time out. "
+        "Upgrade to a GPU." if degraded else ""
+    )
+    return {"ok": True, "device": device, "model": "real", "degraded": degraded, "reason": reason}

server/imageutil.py ADDED Viewed

	@@ -0,0 +1,61 @@

+"""Encode images as base64 data URIs for llama.cpp vision chat handlers.
+Shared by the Mac collector (attachments) and the UI (manual upload).
+"""
+from __future__ import annotations
+import base64
+import mimetypes
+from pathlib import Path
+# Skip anything bigger than this to keep payloads/context sane.
+MAX_BYTES = 4 * 1024 * 1024  # 4 MB
+IMAGE_MIMES = {"image/png", "image/jpeg", "image/gif", "image/webp", "image/heic"}
+def is_image(path: str) -> bool:
+    mime, _ = mimetypes.guess_type(path)
+    return mime in IMAGE_MIMES
+def _heic_to_jpeg(p: Path) -> bytes | None:
+    """Transcode HEIC/HEIF to JPEG bytes (pillow-heif), or None if unavailable.
+    llama.cpp's clip handler can't decode HEIC, so raw pass-through would fail
+    or waste context — and iPhone attachments are predominantly HEIC."""
+    try:
+        import io
+        import pillow_heif
+        from PIL import Image
+        pillow_heif.register_heif_opener()
+        img = Image.open(p).convert("RGB")
+        buf = io.BytesIO()
+        img.save(buf, format="JPEG", quality=88)
+        return buf.getvalue()
+    except Exception:  # noqa: BLE001  no pillow-heif / corrupt file -> skip
+        return None
+def to_data_uri(path: str) -> str | None:
+    """Return a `data:<mime>;base64,...` URI, or None if not a usable image.
+    HEIC is transcoded to JPEG (the vision stack can't decode HEIC); when
+    transcoding isn't available the file is skipped, never sent undecodable."""
+    p = Path(path)
+    if not p.exists() or p.stat().st_size > MAX_BYTES:
+        return None
+    mime, _ = mimetypes.guess_type(str(p))
+    if mime not in IMAGE_MIMES:
+        return None
+    if mime == "image/heic" or p.suffix.lower() in (".heic", ".heif"):
+        jpeg = _heic_to_jpeg(p)
+        if jpeg is None:
+            return None
+        return "data:image/jpeg;base64," + base64.b64encode(jpeg).decode("ascii")
+    b64 = base64.b64encode(p.read_bytes()).decode("ascii")
+    return f"data:{mime};base64,{b64}"
+def paths_to_data_uris(paths: list[str]) -> list[str]:
+    return [u for u in (to_data_uri(p) for p in paths or []) if u]

server/impact.py ADDED Viewed

	@@ -0,0 +1,87 @@

+"""Durable weekly impact metrics: how much the agent actually saved the user.
+Unlike the in-memory activity bus (``server/events.py``), which is an 800-entry
+ring buffer lost on restart, this records a small set of counters **per ISO week**
+to a JSON file, so the "This week" panel accumulates over real use and survives
+restarts.
+Counters per week: ``events_captured``, ``conflicts_caught``, ``minutes_saved``.
+A "capture" is the user *accepting* events by exporting them (``.ics`` download or
+Google Calendar push) — see ``ui/blocks.py``. ``minutes_saved`` is a deliberately
+conservative, fully configurable **estimate** (not a measurement): a fixed number
+of minutes per event captured plus per conflict caught.
+Persistence mirrors ``app.py``'s ``_append_feed``: an env-overridable JSON file
+under ``/tmp`` by default. No database — local-first by design.
+"""
+from __future__ import annotations
+import json
+import os
+import threading
+from datetime import datetime
+from pathlib import Path
+_lock = threading.Lock()
+_ZERO = {"events_captured": 0, "conflicts_caught": 0, "minutes_saved": 0}
+def _path() -> Path:
+    return Path(os.environ.get("IMPACT_PATH", "/tmp/impact_weeks.json"))
+def _min_per_event() -> int:
+    return int(os.environ.get("IMPACT_MIN_PER_EVENT", "8"))
+def _min_per_conflict() -> int:
+    return int(os.environ.get("IMPACT_MIN_PER_CONFLICT", "15"))
+def _week_key(when: datetime | None = None) -> str:
+    iso = (when or datetime.now()).isocalendar()
+    return f"{iso.year}-W{iso.week:02d}"
+def _load() -> dict:
+    try:
+        return json.loads(_path().read_text())
+    except Exception:  # noqa: BLE001  missing/corrupt file -> start fresh
+        return {}
+def record_capture(events_captured: int, conflicts_caught: int = 0) -> dict:
+    """Durably add to the current week's counters; return that week's record.
+    Re-reads the file before incrementing so concurrent writers and restarts
+    never drop prior counts (append/aggregate, never overwrite-from-memory).
+    """
+    minutes = events_captured * _min_per_event() + conflicts_caught * _min_per_conflict()
+    key = _week_key()
+    with _lock:
+        data = _load()
+        wk = {**_ZERO, **data.get(key, {})}
+        wk["events_captured"] += events_captured
+        wk["conflicts_caught"] += conflicts_caught
+        wk["minutes_saved"] += minutes
+        data[key] = wk
+        _path().write_text(json.dumps(data, indent=2))
+        return dict(wk)
+def this_week() -> dict:
+    """Read-only current-week record (all zeros if nothing recorded yet).
+    The durable, weekly analogue of ``events.metrics()``.
+    """
+    return {**_ZERO, **_load().get(_week_key(), {})}
+def reset() -> None:
+    """Drop all recorded impact (test helper)."""
+    with _lock:
+        try:
+            _path().unlink()
+        except FileNotFoundError:
+            pass

server/mcp_tools.py ADDED Viewed

	@@ -0,0 +1,117 @@

+"""Agent-facing tool wrappers exposed via Gradio's MCP server.
+Each function below has a clean signature + docstring on purpose — Gradio's MCP
+layer (`mcp_server=True` in app.py) reads the type hints and docstring to build
+the JSON-Schema a remote MCP client sees. Keep them stateless and JSON-friendly:
+inputs are str / list[dict] / etc., outputs are dict / str / list[dict] (never
+pydantic objects, which don't serialise through the MCP boundary).
+These wrap the existing pipeline (server/pipeline.run_pipeline) and free/busy
+math (calendar_out/freebusy) — no new business logic lives here, just the
+shape adaptation an external agent expects.
+"""
+from __future__ import annotations
+import base64
+import time
+from collections import OrderedDict
+from typing import Optional
+from calendar_out.freebusy import Busy, check_conflicts as _freebusy_check_conflicts, load_ics_busy
+from calendar_out.ics import events_to_ics
+from server.pipeline import AgentRequest, run_pipeline
+from server.schema import Event
+# Short-lived extraction cache. The Agent-tab orchestrator extracts TWICE per
+# run — once when the MiniCPM planner calls this tool over MCP, then again when
+# the scripted path finalizes — and each call runs the full gemma-cal E4B. With
+# identical inputs the second call is pure waste, so memoize on the EXACT inputs
+# (thread + images + memory). Different memory/images -> different key -> a fresh
+# (correct) extraction; the win is the common no-memory case. TTL is generous so
+# the scripted call still hits after a ~2-min planner run; small maxsize bounds
+# cross-request staleness (same input -> same output anyway).
+_EXTRACT_CACHE: "OrderedDict[tuple, tuple[float, dict]]" = OrderedDict()
+_EXTRACT_TTL = 600.0
+_EXTRACT_MAX = 8
+def extract_events(thread: str, images: Optional[list[str]] = None,
+                   memory: Optional[str] = None) -> dict:
+    """Extract calendar events from a pasted iMessage thread (and optional screenshots).
+    The headline tool. Reads a chat or screenshot, returns an ActionPlan with the
+    events found, any conflicts against the user's calendar, and a suggested reply.
+    Runs 100% locally inside the Space via llama.cpp — no cloud AI APIs.
+    Args:
+        thread: Plain-text iMessage conversation, e.g. "Alice: pickup 5pm Thursday".
+            Either ``thread`` or ``images`` must be non-empty.
+        images: Optional list of base64-encoded screenshots (raw base64 or data URIs).
+            Useful when the schedule lives in a screenshot rather than text.
+        memory: Optional plain-text recall block about the user (people and their
+            roles, preferences like default reminders or days they decline) — used
+            to personalize extraction. e.g. "Dana is the soccer coach".
+    Returns:
+        ActionPlan as a JSON-serialisable dict with keys: ``reasoning``,
+        ``events`` (list of {title, start, end, location, attendees, ...}),
+        ``conflicts``, ``proposed_times``, ``reply_draft``, ``needs_clarification``.
+    """
+    key = (thread or "", tuple(images or []), memory or "")
+    now = time.monotonic()
+    hit = _EXTRACT_CACHE.get(key)
+    if hit is not None and now - hit[0] < _EXTRACT_TTL:
+        _EXTRACT_CACHE.move_to_end(key)
+        return hit[1]
+    req = AgentRequest(thread=thread or "", images=images or [], memory=memory,
+                       return_ics=False)
+    resp = run_pipeline(req)
+    plan = resp.plan.model_dump()
+    _EXTRACT_CACHE[key] = (now, plan)
+    _EXTRACT_CACHE.move_to_end(key)
+    while len(_EXTRACT_CACHE) > _EXTRACT_MAX:
+        _EXTRACT_CACHE.popitem(last=False)
+    return plan
+def make_ics(events: list[dict]) -> str:
+    """Render a list of event dicts as an .ics file (base64-encoded).
+    Args:
+        events: List of event dicts in the shape returned by ``extract_events``
+            — each needs at least ``title`` and ``start`` (ISO 8601). Optional:
+            ``end``, ``location``, ``attendees``, ``reminder_minutes``, ``notes``.
+    Returns:
+        Base64-encoded VCALENDAR bytes. Decode and write to ``something.ics`` to
+        import into any calendar app.
+    """
+    ev_objs = [Event(**e) for e in events]
+    return base64.b64encode(events_to_ics(ev_objs)).decode("ascii")
+def check_conflicts(events: list[dict], ics_base64: str) -> list[dict]:
+    """Find clashes between proposed events and busy intervals from an .ics calendar.
+    Deterministic free/busy math — runs without the LLM, so it's safe for agents
+    to call as a fast verification step after ``extract_events``.
+    Args:
+        events: List of proposed event dicts (same shape as ``extract_events``
+            output). Each event needs at least ``title`` and ``start``.
+        ics_base64: Base64-encoded .ics calendar to check against. Typically the
+            user's current calendar exported from Google/Apple/Outlook.
+    Returns:
+        List of conflict dicts: ``{event_index, clashes_with, severity}`` where
+        severity is one of ``"overlap"``, ``"adjacent"``, ``"tight"``. Empty list
+        if nothing clashes.
+    """
+    if not ics_base64:
+        return []
+    try:
+        busy: list[Busy] = load_ics_busy(base64.b64decode(ics_base64))
+    except Exception:  # noqa: BLE001  malformed .ics -> no conflict context
+        return []
+    ev_objs = [Event(**e) for e in events]
+    return [c.model_dump() for c in _freebusy_check_conflicts(ev_objs, busy)]

server/memory.py ADDED Viewed

	@@ -0,0 +1,174 @@

+"""Persistent 'grows-with-you' agent memory.
+Durable facts and preferences that personalize extraction over time:
+people->roles ("Dana is the soccer coach"), rules ("you decline Mondays"),
+default locations. Stored as JSON at MEMORY_PATH.
+- recall()      -> a compact block injected into the agent prompt (server/agent.py)
+- remember()    -> add/strengthen a fact (Memory tab, or a Hermes `remember` tool-call)
+- forget()      -> drop a fact (Memory tab)
+- observe_plan()-> conservatively learn recurring contacts from extracted events
+This is the "memory" half of a Hermes-style grows-with-you agent; the model
+(served via INFERENCE_BASE_URL) is the reasoning half.
+"""
+from __future__ import annotations
+import json
+import os
+import threading
+from pathlib import Path
+MEMORY_PATH = Path(os.environ.get("MEMORY_PATH", "/tmp/agent_memory.json"))
+MAX_FACTS = 200
+KINDS = ("contact", "preference", "location", "note")
+_lock = threading.Lock()
+def _norm(text: str) -> str:
+    return " ".join(text.lower().split())
+def _load() -> dict:
+    try:
+        data = json.loads(MEMORY_PATH.read_text())
+        if isinstance(data, dict) and isinstance(data.get("facts"), list):
+            return data
+    except Exception:  # noqa: BLE001  missing/corrupt -> empty
+        pass
+    return {"facts": [], "seq": 0}
+def _save(state: dict) -> None:
+    MEMORY_PATH.parent.mkdir(parents=True, exist_ok=True)
+    MEMORY_PATH.write_text(json.dumps(state, indent=2))
+def remember(text: str, kind: str = "note") -> dict | None:
+    """Add a fact, or strengthen (bump weight) an existing one with the same text."""
+    text = (text or "").strip()
+    if not text:
+        return None
+    if kind not in KINDS:
+        kind = "note"
+    key = _norm(text)
+    with _lock:
+        state = _load()
+        for f in state["facts"]:
+            if _norm(f["text"]) == key:
+                f["weight"] = f.get("weight", 1) + 1
+                _save(state)
+                return f
+        state["seq"] += 1
+        fact = {"id": state["seq"], "kind": kind, "text": text, "weight": 1}
+        state["facts"].append(fact)
+        state["facts"] = state["facts"][-MAX_FACTS:]
+        _save(state)
+        return fact
+def forget(fact_id: int) -> bool:
+    with _lock:
+        state = _load()
+        before = len(state["facts"])
+        state["facts"] = [f for f in state["facts"] if f["id"] != int(fact_id)]
+        changed = len(state["facts"]) != before
+        if changed:
+            _save(state)
+        return changed
+def list_facts() -> list[dict]:
+    return _load()["facts"]
+def recall(limit: int = 20) -> str:
+    """Compact 'what I know about you' block for the prompt; '' if empty.
+    Strongest (most-reinforced) facts first so the prompt stays small but useful.
+    """
+    facts = sorted(list_facts(), key=lambda f: f.get("weight", 1), reverse=True)[:limit]
+    if not facts:
+        return ""
+    lines = "\n".join(f"- {f['text']}" for f in facts)
+    return "What I know about you (memory):\n" + lines
+def observe_plan(plan) -> None:
+    """Conservatively learn from an extracted ActionPlan: record event attendees as
+    contacts (reinforced over time). Cheap, deterministic 'growth' without an LLM
+    round-trip; explicit facts still come via remember()/the Memory tab/tool-calls."""
+    try:
+        for ev in getattr(plan, "events", []) or []:
+            for name in getattr(ev, "attendees", []) or []:
+                name = (name or "").strip()
+                if name and len(name) <= 40:
+                    remember(f"{name} is a contact you make plans with", kind="contact")
+    except Exception:  # noqa: BLE001  memory must never break extraction
+        pass
+def reset() -> None:
+    """Clear memory (used by tests)."""
+    with _lock:
+        _save({"facts": [], "seq": 0})
+# --------------------------------------------------------------------------- #
+# Client-owned memory (per-user, browser localStorage). These are PURE helpers
+# that operate on a passed-in facts list — no global file — so each visitor's
+# memory can live on their device and be threaded through the agent per request.
+# --------------------------------------------------------------------------- #
+def facts_to_recall(facts: list[dict], limit: int = 20) -> str:
+    """Same compact 'what I know about you' block as recall(), but for a passed
+    facts list (client/localStorage memory). '' if empty."""
+    facts = sorted(facts or [], key=lambda f: f.get("weight", 1), reverse=True)[:limit]
+    if not facts:
+        return ""
+    lines = "\n".join(f"- {f['text']}" for f in facts if (f or {}).get("text"))
+    return "What I know about you (memory):\n" + lines if lines else ""
+def merge_facts(facts: list[dict], texts, kind: str = "note") -> list[dict]:
+    """Add texts to a facts list (dedup by normalized text → bump weight), keeping
+    ids stable. Returns a NEW list (caller persists it). `texts` is an iterable of
+    strings, or of (text, kind) pairs."""
+    facts = [dict(f) for f in (facts or [])]
+    by_key = {_norm(f["text"]): f for f in facts if f.get("text")}
+    next_id = max((int(f.get("id", 0)) for f in facts), default=0) + 1
+    for item in texts or []:
+        if isinstance(item, (tuple, list)):
+            text, k = item[0], (item[1] if len(item) > 1 else kind)
+        else:
+            text, k = item, kind
+        text = (text or "").strip()
+        if not text:
+            continue
+        if k not in KINDS:
+            k = "note"
+        key = _norm(text)
+        if key in by_key:
+            f = by_key[key]
+            f["weight"] = f.get("weight", 1) + 1
+        else:
+            f = {"id": next_id, "kind": k, "text": text, "weight": 1}
+            next_id += 1
+            facts.append(f)
+            by_key[key] = f
+    return facts[-MAX_FACTS:]
+def learn_from_plan(plan) -> list[str]:
+    """Contact texts to learn from an ActionPlan (the observe_plan() logic, but
+    RETURNED for client-side merge instead of written to the global file)."""
+    out: list[str] = []
+    try:
+        for ev in getattr(plan, "events", []) or []:
+            for name in getattr(ev, "attendees", []) or []:
+                name = (name or "").strip()
+                if name and len(name) <= 40:
+                    out.append(f"{name} is a contact you make plans with")
+    except Exception:  # noqa: BLE001  memory must never break extraction
+        pass
+    return out

server/model.py ADDED Viewed

	@@ -0,0 +1,317 @@

+"""Load the fine-tuned Gemma 4 GGUF and run inference via llama.cpp.
+Llama Champion: all generation goes through llama-cpp-python — no cloud AI API.
+The GGUF is downloaded from HF at startup so the Space image stays small.
+Two inference locations, selected by env:
+- in-process llama.cpp, GPU-offloaded inside an @spaces.GPU lease (ZeroGPU), or
+- a remote OpenAI-compatible / llama.cpp server via INFERENCE_BASE_URL
+  (e.g. a llama-server on the phone itself, or a backend).
+"""
+from __future__ import annotations
+import os
+import threading
+import time
+from huggingface_hub import hf_hub_download
+from . import events
+# The platform runs the gemma-cal EDGE fine-tune (Gemma-4 E4B, ~5GB Q4) — our own
+# calendar-native model, eval-gated before every publish (docs/eval-roadmap.md).
+# MODEL SIZE (hackathon hard constraint, <= 32B): E4B = ~4B effective params.
+# All inference is local via llama.cpp (no cloud AI).
+MODEL_REPO = os.environ.get("MODEL_REPO", "ParetoOptimal/gemma-4-cal-gguf")
+MODEL_FILE = os.environ.get("MODEL_FILE", "gemma-cal-e4b-Q4_K_M.gguf")
+# Vision projector (mmproj). Set to enable image input; leave empty for text-only.
+# MMPROJ_REPO lets the projector come from a different repo than the LLM — the E4B
+# edge model pairs with the base E4B's projector, not a projector in our repo.
+MMPROJ_REPO = os.environ.get("MMPROJ_REPO", "") or os.environ.get("MODEL_REPO", "ParetoOptimal/gemma-4-cal-gguf")
+MMPROJ_FILE = os.environ.get("MMPROJ_FILE", "")
+# llama-cpp-python vision handler class (in llama_cpp.llama_chat_format). Gemma 4
+# vision may ship a dedicated handler; the generic clip/Llava handler is the default.
+CHAT_HANDLER = os.environ.get("CHAT_HANDLER", "Llava15ChatHandler")
+N_CTX = int(os.environ.get("N_CTX", "8192"))
+N_GPU_LAYERS = int(os.environ.get("N_GPU_LAYERS", "-1"))  # -1 = offload all (GPU)
+GPU_DURATION = int(os.environ.get("GPU_DURATION", "120"))  # ZeroGPU lease seconds
+# Configurable inference location. If INFERENCE_BASE_URL is set, generation is
+# delegated to a remote OpenAI-compatible / llama.cpp server (e.g. a llama-server
+# running on the phone itself, or a backend) instead of loading the GGUF in-process.
+# This is how the same agent runs on-device OR thin-client — selected by env.
+INFERENCE_BASE_URL = os.environ.get("INFERENCE_BASE_URL", "")
+INFERENCE_API_KEY = os.environ.get("INFERENCE_API_KEY", "")
+INFERENCE_MODEL = os.environ.get("INFERENCE_MODEL", "local")
+# Let a tool-calling model (Hermes) write its own long-term memory mid-run.
+# Only applies to the remote path (server/tools.py); off by default.
+HERMES_TOOLS = os.environ.get("HERMES_TOOLS") == "1"
+_llm = None
+_lock = threading.Lock()
+# ZeroGPU: GPU-bound work must run inside an @spaces.GPU function (the GPU is
+# attached only for that call). Locally / in CI the `spaces` package is absent,
+# so `gpu` degrades to a no-op decorator and stub mode never touches this path.
+try:
+    from spaces import GPU as _spaces_gpu
+    def gpu(fn):
+        return _spaces_gpu(duration=GPU_DURATION)(fn)
+except Exception:  # noqa: BLE001 - spaces not installed (local/CI)
+    def gpu(fn):
+        return fn
+def _preload_cuda_libs():
+    """Preload CUDA userspace libs so the prebuilt CUDA llama-cpp-python wheel can
+    dlopen. The ZeroGPU/Gradio-SDK env lacks libcudart.so.12 on the default loader
+    path; the nvidia-*-cu12 pip packages provide them. We CDLL them RTLD_GLOBAL so
+    the llama .so's NEEDED deps resolve. Path-independent (no LD_LIBRARY_PATH guess);
+    a no-op off-Linux / when the packages aren't installed."""
+    import ctypes
+    import glob
+    import os
+    try:
+        import nvidia  # namespace package from nvidia-*-cu12 wheels
+    except Exception:  # noqa: BLE001
+        return
+    # nvidia is a PEP 420 namespace package: __file__ is None, use __path__.
+    bases = list(getattr(nvidia, "__path__", []) or [])
+    # cublas before its dependents is unnecessary ($ORIGIN RPATH resolves siblings).
+    for base in bases:
+        for sub in ("cuda_runtime", "cuda_nvrtc", "cublas"):
+            for so in sorted(glob.glob(os.path.join(base, sub, "lib", "*.so*"))):
+                try:
+                    ctypes.CDLL(so, mode=ctypes.RTLD_GLOBAL)
+                except OSError:
+                    pass
+def _build_chat_handler():
+    """Return a vision chat handler if MMPROJ_FILE is set, else None (text-only)."""
+    if not MMPROJ_FILE:
+        return None
+    import llama_cpp.llama_chat_format as fmt
+    mmproj_path = hf_hub_download(repo_id=MMPROJ_REPO, filename=MMPROJ_FILE)
+    handler_cls = getattr(fmt, CHAT_HANDLER)
+    return handler_cls(clip_model_path=mmproj_path, verbose=False)
+def get_llm():
+    """Lazily download + load the GGUF once, thread-safe."""
+    global _llm
+    if _llm is None:
+        with _lock:
+            if _llm is None:
+                _preload_cuda_libs()  # satisfy libcudart.so.12 etc. before loading
+                from llama_cpp import Llama  # imported lazily so tests can stub
+                path = hf_hub_download(repo_id=MODEL_REPO, filename=MODEL_FILE)
+                _llm = Llama(
+                    model_path=path,
+                    n_ctx=N_CTX,
+                    n_gpu_layers=N_GPU_LAYERS,
+                    chat_handler=_build_chat_handler(),  # enables image_url inputs
+                    verbose=False,
+                )
+    return _llm
+# --- GPU-scoped inner functions (run inside the ZeroGPU lease) ---
+# These do the actual in-process llama.cpp work; emits stay in the main-process
+# wrappers below because in-memory state (the events bus) isn't shared back from
+# the ZeroGPU subprocess.
+@gpu
+def _infer_text(messages: list[dict], temperature: float, max_tokens: int) -> str:
+    out = get_llm().create_chat_completion(
+        messages=messages, temperature=temperature, max_tokens=max_tokens
+    )
+    return out["choices"][0]["message"]["content"]
+@gpu
+def _infer_json(messages: list[dict], json_schema: dict, temperature: float, max_tokens: int):
+    out = get_llm().create_chat_completion(
+        messages=messages,
+        temperature=temperature,
+        max_tokens=max_tokens,
+        response_format={"type": "json_object", "schema": json_schema},
+    )
+    usage = out.get("usage") or {}
+    return out["choices"][0]["message"]["content"], usage.get("completion_tokens")
+@gpu
+def _infer_stream(messages: list[dict], json_schema: dict, temperature: float, max_tokens: int):
+    stream = get_llm().create_chat_completion(
+        messages=messages,
+        temperature=temperature,
+        max_tokens=max_tokens,
+        response_format={"type": "json_object", "schema": json_schema},
+        stream=True,
+    )
+    for chunk in stream:
+        delta = chunk["choices"][0].get("delta", {}).get("content")
+        if delta:
+            yield delta
+# --- remote inference seam (on-device / thin-client via INFERENCE_BASE_URL) ---
+def _remote_payload(messages, json_schema, temperature, max_tokens, stream):
+    return {
+        "model": INFERENCE_MODEL,
+        "messages": messages,
+        "temperature": temperature,
+        "max_tokens": max_tokens,
+        # llama-server accepts json_schema (OpenAI-style); the in-process path uses
+        # the json_object+schema form. Both grammar-constrain the output.
+        "response_format": {
+            "type": "json_schema",
+            "json_schema": {"name": "ActionPlan", "schema": json_schema, "strict": True},
+        },
+        "stream": stream,
+    }
+def _remote_headers() -> dict:
+    h = {"Content-Type": "application/json"}
+    if INFERENCE_API_KEY:
+        h["Authorization"] = f"Bearer {INFERENCE_API_KEY}"
+    return h
+def _remote_complete_json(messages, json_schema, temperature, max_tokens) -> str:
+    import requests  # already a dependency; imported here to keep import light
+    t0 = time.perf_counter()
+    if HERMES_TOOLS:
+        # Tool-calling loop: the model may call `remember` to update memory before
+        # returning the final ActionPlan JSON. See server/tools.py.
+        from .tools import TOOL_SPECS, run_with_tools
+        def _post(msgs):
+            payload = _remote_payload(msgs, json_schema, temperature, max_tokens, False)
+            payload["tools"] = TOOL_SPECS
+            r = requests.post(
+                f"{INFERENCE_BASE_URL.rstrip('/')}/chat/completions",
+                json=payload,
+                headers=_remote_headers(),
+                timeout=120,
+            )
+            r.raise_for_status()
+            return r.json()
+        content, out = run_with_tools(list(messages), _post)
+        usage = out.get("usage") or {}
+        events.emit(
+            "model",
+            "remote inference complete (tools)",
+            latency_ms=round((time.perf_counter() - t0) * 1000),
+            tokens=usage.get("completion_tokens"),
+        )
+        return content
+    resp = requests.post(
+        f"{INFERENCE_BASE_URL.rstrip('/')}/chat/completions",
+        json=_remote_payload(messages, json_schema, temperature, max_tokens, False),
+        headers=_remote_headers(),
+        timeout=120,
+    )
+    resp.raise_for_status()
+    out = resp.json()
+    usage = out.get("usage") or {}
+    events.emit(
+        "model",
+        "remote inference complete",
+        latency_ms=round((time.perf_counter() - t0) * 1000),
+        tokens=usage.get("completion_tokens"),
+    )
+    return out["choices"][0]["message"]["content"]
+def _remote_stream_json(messages, json_schema, temperature, max_tokens):
+    import json as _json
+    import requests
+    t0 = time.perf_counter()
+    events.emit("model", "remote inference started")
+    with requests.post(
+        f"{INFERENCE_BASE_URL.rstrip('/')}/chat/completions",
+        json=_remote_payload(messages, json_schema, temperature, max_tokens, True),
+        headers=_remote_headers(),
+        timeout=120,
+        stream=True,
+    ) as resp:
+        resp.raise_for_status()
+        for raw in resp.iter_lines():
+            if not raw:
+                continue
+            line = raw.decode("utf-8").removeprefix("data: ").strip()
+            if not line or line == "[DONE]":
+                continue
+            try:
+                delta = _json.loads(line)["choices"][0].get("delta", {}).get("content")
+            except (ValueError, KeyError, IndexError):
+                continue
+            if delta:
+                yield delta
+    events.emit(
+        "model", "remote stream complete", latency_ms=round((time.perf_counter() - t0) * 1000)
+    )
+# --- main-process wrappers (own the activity-bus emits; pick local vs remote) ---
+def complete(messages: list[dict], temperature: float = 0.2, max_tokens: int = 1024) -> str:
+    """Chat-completion helper returning the assistant text."""
+    return _infer_text(messages, temperature, max_tokens)
+def complete_json(
+    messages: list[dict],
+    json_schema: dict,
+    temperature: float = 0.2,
+    max_tokens: int = 2048,
+) -> str:
+    """Constrained completion: grammar-constrained so the output always parses.
+    Delegates to a remote server if INFERENCE_BASE_URL is set, else runs the
+    GPU-offloaded in-process llama.cpp path."""
+    if INFERENCE_BASE_URL:
+        return _remote_complete_json(messages, json_schema, temperature, max_tokens)
+    t0 = time.perf_counter()
+    text, tokens = _infer_json(messages, json_schema, temperature, max_tokens)
+    events.emit(
+        "model",
+        "inference complete",
+        latency_ms=round((time.perf_counter() - t0) * 1000),
+        tokens=tokens,
+    )
+    return text
+def stream_complete_json(
+    messages: list[dict],
+    json_schema: dict,
+    temperature: float = 0.2,
+    max_tokens: int = 2048,
+):
+    """Streaming constrained completion: yields text deltas so the UI can show the
+    model 'thinking'. Remote seam when INFERENCE_BASE_URL is set, else GPU-offloaded
+    in-process llama.cpp. Emits model events around the call."""
+    if INFERENCE_BASE_URL:
+        yield from _remote_stream_json(messages, json_schema, temperature, max_tokens)
+        return
+    t0 = time.perf_counter()
+    events.emit("model", "inference started")
+    for delta in _infer_stream(messages, json_schema, temperature, max_tokens):
+        yield delta
+    events.emit(
+        "model", "stream complete", latency_ms=round((time.perf_counter() - t0) * 1000)
+    )

server/orchestrator.py ADDED Viewed

	@@ -0,0 +1,191 @@

+"""MiniCPM-planned agent orchestrator over the Space's own MCP tools.
+The Agent tab's engine: a small planner LLM (OpenBMB MiniCPM via a second
+llama-server, OpenAI-compatible) drives smolagents' ToolCallingAgent against
+the SAME tools this Space already exposes over MCP (extract_events /
+check_conflicts / make_ics) — consumed via the localhost MCP endpoint, so the
+agent demonstrably works through the public tool contract, not private
+imports. Everything stays local llama.cpp: no cloud AI APIs, every model
+under the 32B cap (gemma-cal E4B ~4B + MiniCPM 8B or 1B).
+Stub mode (USE_STUB_EXTRACTOR=1, used by the free preview and CI) — or any
+planner failure — falls back to ScriptedPlanner: the same tool sequence run
+deterministically, emitting identical step events, so the tab always works
+and tests never need a model.
+Steps are plain JSON-serialisable dicts:
+    {"kind": "plan"|"tool_call"|"tool_result"|"final"|"error", ...}
+"""
+from __future__ import annotations
+import json
+import os
+from typing import Iterator, Optional
+from server import events as bus
+# Planner serving (second llama-server) — env-selected, OFF by default.
+# 8B default for planning quality; MiniCPM5-1B is the <=4B tiny variant.
+PLANNER_BASE_URL = os.environ.get("PLANNER_BASE_URL", "http://127.0.0.1:8081/v1")
+PLANNER_MODEL_ID = os.environ.get("PLANNER_MODEL_ID", "minicpm-planner")
+# Self MCP endpoint (localhost — no HF edge/auth between us and ourselves).
+MCP_SSE_URL = os.environ.get(
+    "MCP_SSE_URL", f"http://127.0.0.1:{os.environ.get('PORT', '7860')}/gradio_api/mcp/sse"
+)
+ORCH_TASK = """You are a scheduling agent for a busy parent. Read the thread below.
+Call exactly ONE tool — extract_events on the thread — then STOP. It returns the
+events (the fine-tuned calendar model does the real work), a reply draft, and any
+clarification. After that one call, return a short JSON summary: {{"events": <int>}}.
+Do NOT call any other tool: conflict-checking and the .ics file are handled for you.
+{memory}
+Thread:
+{thread}
+"""
+def _planner_configured() -> bool:
+    return bool(os.environ.get("PLANNER_HF_REPO") or os.environ.get("PLANNER_BASE_URL"))
+def _use_llm_planner() -> bool:
+    return os.environ.get("USE_STUB_EXTRACTOR") != "1" and _planner_configured()
+def _short(obj, limit: int = 1200) -> str:
+    try:
+        s = obj if isinstance(obj, str) else json.dumps(obj, default=str)
+    except Exception:  # noqa: BLE001
+        s = str(obj)
+    return s if len(s) <= limit else s[:limit] + " …"
+# --------------------------------------------------------------------------- #
+# ScriptedPlanner — deterministic fallback / stub-mode path
+# --------------------------------------------------------------------------- #
+def _scripted_steps(thread: str, ics_b64: Optional[str],
+                    memory_block: Optional[str],
+                    images: Optional[list[str]] = None) -> Iterator[dict]:
+    from server import mcp_tools
+    yield {"kind": "plan",
+           "text": "Playbook: extract events from the thread"
+                   + (f" + {len(images)} screenshot(s)" if images else "")
+                   + (", check conflicts against the provided calendar" if ics_b64 else "")
+                   + ", then render an .ics."}
+    yield {"kind": "tool_call", "tool": "extract_events",
+           "args": {"thread": _short(thread, 300),
+                    **({"images": f"{len(images)} image(s)"} if images else {}),
+                    **({"memory": "<user recall block>"} if memory_block else {})}}
+    plan = mcp_tools.extract_events(thread, images or None, memory_block)
+    yield {"kind": "tool_result", "tool": "extract_events",
+           "result": {"events": len(plan.get("events", [])),
+                      "reply_draft": _short(plan.get("reply_draft") or "", 200)}}
+    conflicts: list = list(plan.get("conflicts") or [])
+    if ics_b64 and plan.get("events"):
+        yield {"kind": "tool_call", "tool": "check_conflicts",
+               "args": {"events": f"{len(plan['events'])} event(s)", "ics_base64": "<calendar>"}}
+        conflicts = mcp_tools.check_conflicts(plan["events"], ics_b64)
+        plan["conflicts"] = conflicts
+        yield {"kind": "tool_result", "tool": "check_conflicts",
+               "result": {"conflicts": len(conflicts)}}
+    ics_out = None
+    if plan.get("events"):
+        yield {"kind": "tool_call", "tool": "make_ics",
+               "args": {"events": f"{len(plan['events'])} event(s)"}}
+        ics_out = mcp_tools.make_ics(plan["events"])
+        yield {"kind": "tool_result", "tool": "make_ics",
+               "result": {"ics_bytes": len(ics_out or "")}}
+    yield {"kind": "final", "plan": plan, "ics_base64": ics_out,
+           "summary": {"events": len(plan.get("events", [])), "conflicts": len(conflicts)}}
+# --------------------------------------------------------------------------- #
+# smolagents path — MiniCPM planner over the self MCP endpoint
+# --------------------------------------------------------------------------- #
+def _smol_steps(thread: str, ics_b64: Optional[str],
+                memory_block: Optional[str], max_steps: int,
+                images: Optional[list[str]] = None) -> Iterator[dict]:
+    # Lazy imports: smolagents is only needed on the real path, keeping CI and
+    # the stub preview dependency-free.
+    from smolagents import OpenAIServerModel, ToolCallingAgent  # noqa: PLC0415
+    from smolagents.mcp_client import MCPClient  # noqa: PLC0415
+    model = OpenAIServerModel(
+        model_id=PLANNER_MODEL_ID, api_base=PLANNER_BASE_URL,
+        api_key=os.environ.get("PLANNER_API_KEY", "local"), temperature=0.0,
+    )
+    task = ORCH_TASK.format(
+        memory=(f"What you know about this user:\n{memory_block}" if memory_block else ""),
+        thread=thread,
+    )
+    yield {"kind": "plan", "text": f"MiniCPM planner ({PLANNER_MODEL_ID}) engaged — "
+                                   f"tools via MCP at {MCP_SSE_URL}"}
+    with MCPClient({"url": MCP_SSE_URL, "transport": "sse"}) as tools:
+        # Minimal-footprint planner: expose ONLY extract_events and cap the loop
+        # at a couple of steps. The fine-tuned E4B (inside extract_events) does
+        # the real work; conflict-checking and the .ics are finalized
+        # deterministically by _scripted_steps below. This keeps the planner to a
+        # single tool call so it stays fast and never accumulates enough context
+        # to overflow (multi-step runs hit ~207s and 'request exceeds context').
+        # Restricting tools also avoids the File-input callbacks whose schemas
+        # $ref #/$defs/FileData (which the planner's jinja rendering can't resolve).
+        _WANTED = {"extract_events"}
+        tools = [t for t in tools if getattr(t, "name", "") in _WANTED]
+        agent = ToolCallingAgent(tools=tools, model=model, max_steps=min(max_steps, 3))
+        result = None
+        for step in agent.run(task, stream=True):
+            kind = type(step).__name__
+            if kind == "ActionStep":
+                for call in (getattr(step, "tool_calls", None) or []):
+                    yield {"kind": "tool_call",
+                           "tool": getattr(call, "name", "?"),
+                           "args": _short(getattr(call, "arguments", ""))}
+                obs = getattr(step, "observations", None)
+                if obs:
+                    yield {"kind": "tool_result", "tool": "(observation)",
+                           "result": _short(obs)}
+            elif kind == "FinalAnswerStep":
+                result = getattr(step, "final_answer", None) or getattr(step, "output", None)
+        yield {"kind": "plan", "text": f"Planner finished: {_short(result, 300)}"}
+    # The planner's free-text answer isn't the product — re-derive the
+    # structured plan through the deterministic path so the UI always gets a
+    # valid ActionPlan + ics, with the planner trace above as the evidence.
+    yield from _scripted_steps(thread, ics_b64, memory_block, images)
+# --------------------------------------------------------------------------- #
+# Entry point
+# --------------------------------------------------------------------------- #
+def run_orchestrator(thread: str, ics_b64: Optional[str] = None,
+                     memory_block: Optional[str] = None,
+                     max_steps: int = 6,
+                     images: Optional[list[str]] = None) -> Iterator[dict]:
+    """Yield orchestration steps for a thread (+ optional screenshot data URIs);
+    always ends with a 'final' step (or an 'error' followed by the scripted
+    fallback's steps)."""
+    with bus.run_scope("agent"):
+        bus.emit("decision", "agent orchestrator run started")
+        if _use_llm_planner():
+            try:
+                yield from _smol_steps(thread, ics_b64, memory_block, max_steps, images)
+                bus.emit("decision", "agent orchestrator run finished (MiniCPM planner)")
+                return
+            except Exception as e:  # noqa: BLE001  planner down -> scripted fallback
+                # Surface the actual message (e.g. which module is missing), not
+                # just the type — a bare "ModuleNotFoundError" hides the cause.
+                detail = f"{type(e).__name__}: {e}".strip().rstrip(":")
+                yield {"kind": "error",
+                       "text": f"Planner unavailable ({_short(detail, 160)}) — "
+                               "falling back to the scripted playbook."}
+        yield from _scripted_steps(thread, ics_b64, memory_block, images)
+        bus.emit("decision", "agent orchestrator run finished (scripted)")

server/pipeline.py ADDED Viewed

	@@ -0,0 +1,98 @@

+"""The shared 'thread in -> ActionPlan out' pipeline.
+One implementation behind both the synchronous ``POST /agent`` endpoint and the
+autonomous ingest path, so the two can never drift. Stateless: it does not touch
+the feed or the dedup store (callers own statefulness).
+Importable without Gradio. Google Calendar is imported lazily so CI / stub mode
+(which exclude the google libs) stay clean.
+"""
+from __future__ import annotations
+import base64
+from typing import Optional
+from dateutil import parser as dtparser
+from pydantic import BaseModel
+from calendar_out.freebusy import DEFAULT_DURATION, Busy, _as_dt, annotate_conflicts, load_ics_busy
+from calendar_out.ics import events_to_ics
+from server import events as bus
+from server.agent import run_agent
+from server.schema import ActionPlan, Event
+from server.threads import format_thread
+class AgentMessage(BaseModel):
+    sender: str = "?"
+    text: str = ""
+class AgentRequest(BaseModel):
+    thread: Optional[str] = None
+    messages: Optional[list[AgentMessage]] = None
+    images: list[str] = []  # base64 data URIs
+    existing_ics: Optional[str] = None  # base64-encoded .ics bytes
+    existing_events: list[Event] = []
+    now: Optional[str] = None  # ISO 8601; defaults to datetime.now()
+    push_gcal: bool = False
+    return_ics: bool = False
+    memory: Optional[str] = None  # per-user recall block (else server global memory)
+class AgentResponse(BaseModel):
+    plan: ActionPlan
+    ics_base64: Optional[str] = None
+    gcal_links: list[str] = []
+def _busy_from_request(req: AgentRequest) -> list[Busy]:
+    """Build busy intervals from an uploaded .ics or structured existing events."""
+    if req.existing_ics:
+        try:
+            return load_ics_busy(base64.b64decode(req.existing_ics))
+        except Exception:  # noqa: BLE001  malformed .ics -> no conflict context
+            return []
+    busy: list[Busy] = []
+    for ev in req.existing_events:
+        start = _as_dt(ev.start)
+        if start is None:
+            continue
+        end = _as_dt(ev.end) or (start + DEFAULT_DURATION)
+        busy.append(Busy(start=start, end=end, title=ev.title))
+    return busy
+def _thread_text(req: AgentRequest) -> str:
+    if req.thread:
+        return req.thread
+    if req.messages:
+        return format_thread([m.model_dump() for m in req.messages])
+    return ""
+def run_pipeline(req: AgentRequest) -> AgentResponse:
+    """thread/messages -> run_agent -> deterministic conflicts -> optional ics/gcal."""
+    thread = _thread_text(req)
+    now = dtparser.isoparse(req.now) if req.now else None
+    busy = _busy_from_request(req)
+    plan = run_agent(thread, now=now, existing=req.existing_events, images=req.images,
+                     memory_block=req.memory)
+    if busy:
+        plan = annotate_conflicts(plan, busy)
+    resp = AgentResponse(plan=plan)
+    if req.return_ics:
+        resp.ics_base64 = base64.b64encode(events_to_ics(plan.events)).decode("ascii")
+    if req.push_gcal and plan.events:
+        try:
+            from calendar_out.gcal import push_events  # lazy: google libs optional
+            resp.gcal_links = push_events(plan.events)
+        except Exception as e:  # noqa: BLE001  no token.json / offline -> degrade
+            bus.emit("calendar", f"Google Calendar push skipped: {e}", level="error")
+    return resp

server/schema.py ADDED Viewed

	@@ -0,0 +1,43 @@

+"""Shared pydantic schemas for the scheduling agent.
+The model is constrained to emit an ActionPlan (see server/agent.py); these types
+are also the contract used by the UI and the calendar outputs.
+"""
+from __future__ import annotations
+from typing import Optional
+from pydantic import BaseModel, Field
+class Event(BaseModel):
+    title: str
+    start: str  # ISO 8601, e.g. 2026-06-10T13:00:00
+    end: Optional[str] = None
+    location: Optional[str] = None
+    attendees: list[str] = Field(default_factory=list)
+    reminder_minutes: Optional[int] = None
+    notes: Optional[str] = None
+class Conflict(BaseModel):
+    event_index: int = Field(description="index into ActionPlan.events")
+    clashes_with: str = Field(description="summary of the existing event it clashes with")
+    severity: str = Field(description='one of: "overlap", "adjacent", "tight"')
+class ActionPlan(BaseModel):
+    """Everything the agent decides for one thread, in one constrained object."""
+    reasoning: Optional[str] = Field(
+        default=None, description="brief chain of thought shown to the user"
+    )
+    events: list[Event] = Field(default_factory=list)
+    conflicts: list[Conflict] = Field(default_factory=list)
+    proposed_times: list[str] = Field(
+        default_factory=list, description="ISO 8601 alternatives when there is a conflict"
+    )
+    reply_draft: str = Field(default="", description="suggested reply to send back")
+    needs_clarification: Optional[str] = Field(
+        default=None, description="a question to ask if the plan is ambiguous"
+    )

server/threads.py ADDED Viewed

	@@ -0,0 +1,59 @@

+"""Assemble a conversation thread from individual messages.
+Used by both the ``/agent`` endpoint (join a posted ``messages[]`` into a thread)
+and autonomous mode (build a per-chat rolling window from the ingest feed). Pure —
+no Gradio / llama / network — so it's trivially unit-testable in stub mode.
+"""
+from __future__ import annotations
+import os
+from dateutil import parser as dtparser
+def format_thread(messages: list[dict]) -> str:
+    """Render messages as ``"sender: text"`` lines, skipping empty bodies."""
+    lines = []
+    for m in messages:
+        text = (m.get("text") or "").strip()
+        if not text:
+            continue
+        sender = (m.get("sender") or "?").strip()
+        lines.append(f"{sender}: {text}")
+    return "\n".join(lines)
+def _ts(value) -> float | None:
+    try:
+        return dtparser.parse(str(value)).timestamp()
+    except (ValueError, TypeError, OverflowError):
+        return None
+def rolling_thread(
+    feed: list[dict],
+    chat: str,
+    window: int | None = None,
+    minutes: int | None = None,
+) -> str:
+    """Build a thread from the most recent messages of one chat in the feed.
+    Keeps the last ``window`` messages for ``chat`` that fall within ``minutes`` of
+    the newest one (env-tunable via AUTO_THREAD_WINDOW / AUTO_THREAD_MINUTES).
+    """
+    window = window or int(os.environ.get("AUTO_THREAD_WINDOW", "20"))
+    minutes = minutes or int(os.environ.get("AUTO_THREAD_MINUTES", "720"))
+    msgs = [m for m in feed if (m.get("chat") or "") == chat]
+    if not msgs:
+        return ""
+    msgs = msgs[-window:]
+    # Drop messages older than `minutes` before the newest (when timestamps parse).
+    stamps = [(_ts(m.get("timestamp")), m) for m in msgs]
+    newest = max((s for s, _ in stamps if s is not None), default=None)
+    if newest is not None:
+        cutoff = newest - minutes * 60
+        msgs = [m for s, m in stamps if s is None or s >= cutoff]
+    return format_thread(msgs)

server/tools.py ADDED Viewed

	@@ -0,0 +1,81 @@

+"""Hermes tool-calling: let the model write its own long-term memory.
+Hermes is a tool-calling fine-tune. When `HERMES_TOOLS=1`, the remote inference
+path (server/model.py) advertises these tools so the model can call `remember`
+mid-run to save durable facts ("Dana is the soccer coach", "you decline Mondays")
+— the active half of "grows with you". Kept separate + small so the round-trip
+logic is unit-testable without a live server.
+"""
+from __future__ import annotations
+import json
+from . import memory
+# OpenAI-compatible tool specs (llama-server understands these with --jinja).
+TOOL_SPECS = [
+    {
+        "type": "function",
+        "function": {
+            "name": "remember",
+            "description": (
+                "Save a durable fact or preference about the user to long-term memory "
+                "so future scheduling is more personal. Use for stable facts only "
+                "(roles, recurring preferences, default locations), not one-off details."
+            ),
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "text": {
+                        "type": "string",
+                        "description": "the fact, e.g. 'Dana is the soccer coach'",
+                    },
+                    "kind": {
+                        "type": "string",
+                        "enum": ["contact", "preference", "location", "note"],
+                    },
+                },
+                "required": ["text"],
+            },
+        },
+    }
+]
+def dispatch(name: str, arguments) -> str:
+    """Execute one tool call; returns a short result string for the tool message."""
+    if name != "remember":
+        return f"unknown tool: {name}"
+    try:
+        args = json.loads(arguments) if isinstance(arguments, str) else (arguments or {})
+    except (ValueError, TypeError):
+        args = {}
+    text = (args.get("text") or "").strip()
+    if not text:
+        return "no text provided"
+    memory.remember(text, args.get("kind", "note"))
+    return f"remembered: {text}"
+def run_with_tools(messages: list[dict], post_fn, max_rounds: int = 3):
+    """Drive a tool-calling loop. ``post_fn(messages) -> openai_response_dict`` does
+    the actual HTTP POST (tools already configured by the caller); injectable so the
+    loop is testable. Returns (final_content, last_response)."""
+    msgs = list(messages)
+    resp = {}
+    for _ in range(max_rounds):
+        resp = post_fn(msgs)
+        msg = resp["choices"][0]["message"]
+        tool_calls = msg.get("tool_calls") or []
+        if not tool_calls:
+            return msg.get("content", ""), resp
+        msgs.append(msg)  # assistant turn carrying the tool_calls
+        for tc in tool_calls:
+            fn = tc.get("function", {})
+            result = dispatch(fn.get("name", ""), fn.get("arguments", "{}"))
+            msgs.append(
+                {"role": "tool", "tool_call_id": tc.get("id", ""), "content": result}
+            )
+    # ran out of rounds — one final call to get content
+    resp = post_fn(msgs)
+    return resp["choices"][0]["message"].get("content", ""), resp

server/trace.py ADDED Viewed

	@@ -0,0 +1,98 @@

+"""Export an agent run as a portable, shareable trace (Sharing is Caring).
+The activity bus (``server/events.py``) groups every event from one agent run
+under a ``run_scope`` id. This module serializes such a run into a small,
+self-contained JSON envelope that a user can download and (optionally) publish to
+the Hugging Face Hub with ``training/share_trace.py``.
+Privacy: the bus is structural by design — every ``emit(...)`` carries counts +
+short status strings, never event titles or raw thread text. The *only* free-text
+that can carry personal data is the chat-name suffix in the ingest message
+(``app.py``: ``"N msg(s) from {chats}"``). With ``redact=True`` (the default) that
+tail is dropped. Steps use a fixed key allowlist, so a future payload key can't
+silently leak into a shared trace.
+"""
+from __future__ import annotations
+import json
+import os
+import re
+import tempfile
+from datetime import datetime
+from . import events as bus
+TRACE_SCHEMA = "imessage-cal-trace"
+TRACE_SCHEMA_VERSION = 1
+# Only these keys ever appear in an exported step (allowlist, not denylist).
+_STEP_KEYS = ("stage", "level", "ts", "latency_ms", "events", "conflicts", "images", "tokens")
+def _scrub_message(stage: str, message: str, redact: bool) -> str:
+    """All bus messages are structural except the ingest one, which appends
+    ``" from {chats}"`` (chat names — PII). Drop that tail when redacting."""
+    if redact and stage == "ingest":
+        # "3 msg(s) from 3rd grade chat" -> "3 msg(s)"
+        return re.sub(r"\s+from\s+.*$", "", message)
+    return message
+def _step(ev: dict, redact: bool) -> dict:
+    step = {k: ev[k] for k in _STEP_KEYS if k in ev}
+    step["message"] = _scrub_message(ev.get("stage", ""), ev.get("message", ""), redact)
+    return step
+def export_run(run_id: str | None = None, redact: bool = True) -> dict:
+    """Serialize one agent run (newest by default) into a shareable envelope.
+    Returns a valid empty envelope (``steps == []``) when there is no matching
+    run, so callers don't need to handle exceptions.
+    """
+    runs = bus.recent_runs(n=50)  # newest first
+    evs: list[dict] = []
+    rid = run_id
+    if run_id is None:
+        if runs:
+            rid, evs = runs[0]
+    else:
+        for r, e in runs:
+            if r == run_id:
+                evs = e
+                break
+    steps = [_step(e, redact) for e in evs]
+    summary = {
+        "steps": len(steps),
+        "events": sum(s.get("events", 0) for s in steps),
+        "conflicts": sum(s.get("conflicts", 0) for s in steps),
+        "images": sum(s.get("images", 0) for s in steps),
+        "model_calls": sum(1 for s in steps if s.get("latency_ms") is not None),
+        "total_latency_ms": sum(s.get("latency_ms", 0) for s in steps),
+    }
+    return {
+        "schema": TRACE_SCHEMA,
+        "version": TRACE_SCHEMA_VERSION,
+        "exported_at": datetime.now().isoformat(timespec="seconds"),
+        "run_id": rid,
+        # run ids look like "12:analyze" — the label is the part after ":".
+        "run_label": (rid.split(":", 1)[1] if rid and ":" in rid else None),
+        "redacted": redact,
+        "steps": steps,
+        "summary": summary,
+    }
+def write_trace(trace: dict, path: str | None = None) -> str:
+    """Write a trace envelope to a JSON file and return the path (Gradio download).
+    Mirrors ``calendar_out.ics.write_ics``. Deliberately does NOT emit a bus event
+    — that would mutate the very run being exported.
+    """
+    if path is None:
+        fd, path = tempfile.mkstemp(suffix=".json", prefix="trace_")
+        os.close(fd)
+    with open(path, "w", encoding="utf-8") as f:
+        json.dump(trace, f, indent=2, ensure_ascii=False)
+    return path

static/app.css ADDED Viewed

	@@ -0,0 +1,961 @@

+/* OffGridSchedula — "daylight planner" theme.
+   Soft lavender-paper canvas, deep-ink text (high contrast), violet→cyan identity
+   used on the primary action + key accents. Fraunces (display) + Hanken Grotesk. */
+@import url('https://fonts.googleapis.com/css2?family=Fraunces:opsz,wght@9..144,500;9..144,600;9..144,700&family=Hanken+Grotesk:wght@400;500;600;700&display=swap');
+:root {
+  --bg:       #f4f2fb;   /* soft lavender paper */
+  --bg2:      #ffffff;   /* inputs */
+  --surface:  #ffffff;   /* cards */
+  --surface2: #efecf9;
+  --line:     rgba(31,25,60,0.12);
+  --text:     #1e1934;   /* deep ink */
+  --muted:    #645c84;
+  --violet:   #6d4be0;
+  --cyan:     #0e8ea0;
+  --coral:    #d83a60;
+  --mint:     #15894f;
+  --amber:    #b3700a;
+  --accent:   linear-gradient(100deg, #6d4be0 0%, #0b8294 100%);
+  --radius:   16px;
+  --shadow:   0 12px 30px rgba(45,32,90,0.12);
+}
+/* ---- canvas + base type ---- */
+.gradio-container, .gradio-container * {
+  font-family: "Hanken Grotesk", ui-sans-serif, system-ui, sans-serif !important;
+}
+.gradio-container {
+  background:
+    radial-gradient(1100px 520px at 12% -8%, #ece5ff 0%, transparent 55%),
+    radial-gradient(900px 500px at 100% 0%, #ddf2f4 0%, transparent 50%),
+    var(--bg) !important;
+  color: var(--text) !important;
+  /* Map Gradio's own theme tokens to our light palette so every component is
+     light-surface / dark-text and stays readable. */
+  --body-background-fill: var(--bg);
+  --body-text-color: var(--text);
+  --body-text-color-subdued: var(--muted);
+  --background-fill-primary: var(--surface);
+  --background-fill-secondary: var(--surface2);
+  --block-background-fill: var(--surface);
+  --block-label-background-fill: var(--surface);
+  --block-label-text-color: var(--muted);
+  --block-title-text-color: var(--text);
+  --block-info-text-color: var(--muted);
+  --block-border-color: var(--line);
+  --border-color-primary: var(--line);
+  --border-color-accent: rgba(109,75,224,.5);
+  --input-background-fill: var(--bg2);
+  --input-border-color: var(--line);
+  --input-placeholder-color: var(--muted);
+  --button-secondary-background-fill: var(--surface2);
+  --button-secondary-text-color: var(--text);
+  --button-secondary-border-color: var(--line);
+  --link-text-color: var(--cyan);
+  --link-text-color-hover: var(--cyan);
+  --color-accent: var(--violet);
+  --color-accent-soft: rgba(109,75,224,.14);
+  --table-text-color: var(--text);
+  --table-even-background-fill: var(--surface);
+  --table-odd-background-fill: var(--surface2);
+}
+/* Belt-and-suspenders for the common readable bits (markdown, labels, inputs). */
+.gradio-container .prose,
+.gradio-container .prose p, .gradio-container .prose li,
+.gradio-container .prose h1, .gradio-container .prose h2, .gradio-container .prose h3,
+.gradio-container .prose strong, .gradio-container .prose em,
+.gradio-container label, .gradio-container .gr-box label,
+.gradio-container input, .gradio-container textarea {
+  color: var(--text) !important;
+}
+/* NOTE: checkboxes/radios are excluded — the `background` shorthand would wipe
+   Gradio's checked-state background-image (the checkmark), so they never look
+   checked. Style their accent instead and leave the rest to Gradio. */
+.gradio-container input:not([type="checkbox"]):not([type="radio"]),
+.gradio-container textarea { background: var(--bg2) !important; }
+.gradio-container input[type="checkbox"], .gradio-container input[type="radio"] {
+  accent-color: var(--violet); cursor: pointer; }
+.gradio-container .tab-nav button { color: var(--muted); }
+.gradio-container .tab-nav button.selected { color: var(--text); border-bottom-color: var(--violet); }
+.gradio-container h1, .gradio-container h2, .gradio-container h3,
+#app-header, .evx-title, .evx-head {
+  font-family: "Fraunces", Georgia, serif !important;
+  letter-spacing: -0.01em;
+}
+#app-header {
+  display: flex; align-items: center; gap: .5rem;
+  font-size: 1.9rem; font-weight: 700; line-height: 1.1;
+  margin-bottom: .15rem;
+  background: var(--accent);
+  -webkit-background-clip: text; background-clip: text;
+  -webkit-text-fill-color: transparent;
+}
+/* ---- Review: input card ---- */
+.rv-input {
+  background: var(--surface) !important;
+  border: 1px solid var(--line) !important;
+  border-radius: var(--radius) !important;
+  padding: 14px !important;
+  box-shadow: var(--shadow);
+}
+#rv-textbox textarea {
+  font-size: 1rem !important;
+  line-height: 1.5 !important;
+  background: var(--bg2) !important;
+  border-radius: 12px !important;
+}
+#rv-actions { gap: 10px; margin-top: 10px; }
+/* primary action — the one place the full accent gradient lives */
+#rv-analyze button {
+  background: var(--accent) !important;
+  color: #fff !important;
+  font-weight: 700 !important;
+  font-size: 1.02rem !important;
+  border: none !important;
+  min-height: 50px !important;
+  border-radius: 12px !important;
+  box-shadow: 0 8px 20px rgba(109,75,224,0.28);
+  transition: transform .12s ease, box-shadow .12s ease, filter .12s ease;
+}
+#rv-analyze button:hover { transform: translateY(-1px); filter: brightness(1.06); box-shadow: 0 12px 28px rgba(11,130,148,0.3); }
+#rv-analyze button:active { transform: translateY(0); }
+.rv-status { color: var(--muted); font-size: .9rem; min-height: 1.2em; padding: 2px 2px 0; }
+/* ---- plan summary (reasoning + conflict badges + free-slot chips) ---- */
+.pl-summary {
+  background: var(--surface) !important;
+  border: 1px solid var(--line);
+  border-left: 3px solid var(--violet);
+  border-radius: 12px;
+  padding: 14px 16px;
+  margin: 10px 0;
+  box-shadow: var(--shadow);
+  animation: rise .35s ease both;
+}
+.pl-reason { margin: 0 0 8px; color: var(--text); font-size: .96rem; line-height: 1.5; }
+.pl-row { display: flex; flex-wrap: wrap; gap: 8px; align-items: center; margin-top: 8px; }
+.pl-label { font-size: .72rem; text-transform: uppercase; letter-spacing: .06em; color: var(--muted); }
+.pl-badge {
+  display: inline-flex; align-items: center; gap: 6px;
+  padding: 5px 11px; border-radius: 999px; font-size: .82rem; font-weight: 600;
+}
+.pl-conflict { background: rgba(216,58,96,.12); color: #b21d44; border: 1px solid rgba(216,58,96,.3); }
+.pl-chip {
+  padding: 5px 11px; border-radius: 999px; font-size: .82rem; font-weight: 600;
+  background: rgba(14,142,160,.12); color: #0a6f7d; border: 1px solid rgba(14,142,160,.3);
+}
+.pl-clarify { margin: 10px 0 0; color: var(--amber); font-size: .9rem; }
+.pl-clear { margin: 0; color: var(--mint); font-weight: 600; }
+/* ---- events: billboard (featured) + cards ---- */
+.evx-head { font-size: 1.05rem; font-weight: 600; color: var(--text); margin: 6px 2px 10px; }
+/* Billboard / carousel slide: a soft tinted card with dark text */
+.bb {
+  position: relative; overflow: hidden; border-radius: 18px; margin: 0 0 18px;
+  min-height: 196px; display: flex; align-items: flex-end;
+  background: linear-gradient(120deg, #ece4ff 0%, #d7f1f4 100%);
+  border: 1px solid var(--line); box-shadow: var(--shadow);
+}
+.bb-scrim { position: absolute; inset: 0;
+  background: linear-gradient(to top, rgba(255,255,255,.45) 0%, rgba(255,255,255,.1) 45%, transparent 100%); }
+.bb-body { position: relative; z-index: 2; padding: 22px 24px; width: 100%; animation: rise .45s ease both; }
+.bb-kicker { color: var(--cyan); font-weight: 700; font-size: .74rem; letter-spacing: .14em; text-transform: uppercase; margin-bottom: 8px; }
+.bb-title { font-family: "Fraunces", serif; font-size: 2rem; line-height: 1.05; margin: 0 0 10px; color: var(--text); }
+.bb-when { font-size: 1rem; font-weight: 600; color: #3a3357; }
+.bb-note { margin-top: 8px; color: var(--muted); font-style: italic; }
+.evx-sec { font-size: .76rem; text-transform: uppercase; letter-spacing: .08em; color: var(--muted); margin: 2px 2px 10px; }
+.evx-cards { display: flex; flex-direction: column; gap: 12px; }
+.evx-card {
+  position: relative; display: flex; gap: 0;
+  background: var(--surface); border: 1px solid var(--line);
+  border-radius: 14px; overflow: hidden;
+  box-shadow: 0 4px 14px rgba(45,32,90,.10);
+  transition: transform .18s ease, box-shadow .18s ease, border-color .18s ease;
+  animation: rise .4s ease both; animation-delay: calc(var(--i, 0) * 70ms);
+}
+.evx-card:hover {  /* subtle tactile lift */
+  transform: translateY(-3px) scale(1.012);
+  box-shadow: 0 14px 32px rgba(45,32,90,.18); border-color: rgba(109,75,224,.4);
+}
+.evx-bar { width: 5px; flex: 0 0 5px; background: var(--accent); }
+.evx-body { padding: 13px 16px; min-width: 0; }
+.evx-title { margin: 0 0 8px; font-size: 1.1rem; font-weight: 600; color: var(--text); }
+.evx-chip { display: inline-block; padding: 4px 10px; border: 1px solid var(--line);
+  border-radius: 8px; font-size: .82rem; font-weight: 600; color: var(--text); background: var(--surface2); }
+.evx-when { font-size: .92rem; color: var(--muted); font-weight: 600; }
+.evx-meta { font-size: .85rem; color: var(--muted); margin-top: 8px; }
+.evx-note { font-size: .82rem; color: var(--muted); margin-top: 6px; font-style: italic; }
+/* per-event one-click quick-add links (Online mode) */
+.evx-add { font-size: .8rem; color: var(--muted); margin-top: 8px; }
+.evx-add a { color: var(--cyan); font-weight: 700; text-decoration: none; }
+.evx-add a:hover { text-decoration: underline; }
+/* Agent tab: the orchestrator's step trace */
+.ag-trace { display: flex; flex-direction: column; gap: 6px; }
+.ag-step { display: flex; gap: 10px; align-items: flex-start; background: var(--surface);
+  border: 1px solid var(--line); border-left: 3px solid var(--violet);
+  border-radius: 10px; padding: 9px 12px; font-size: .9rem; animation: rise .3s ease both; }
+.ag-step code { background: var(--surface2); padding: 1px 6px; border-radius: 6px;
+  font-size: .82em; word-break: break-all; }
+.ag-tool_call { border-left-color: var(--cyan); }
+.ag-tool_result { border-left-color: var(--mint); }
+.ag-final { border-left-color: var(--mint); background: rgba(21,137,79,.07); font-weight: 600; }
+.ag-error { border-left-color: var(--coral); }
+.ag-ico { flex: none; }
+/* iPhone share-sheet Shortcut callout (export bar) */
+.ship-note { color: var(--muted); font-size: .82rem; margin-top: 8px; }
+.ship-note a { color: var(--cyan); font-weight: 600; text-decoration: none; }
+.ship-note a:hover { text-decoration: underline; }
+.evx-empty { color: var(--muted); padding: 18px; text-align: center; border: 1px dashed var(--line); border-radius: 12px; }
+/* Horizontal swipe rail (kept for any list use) */
+.evx-rail { display: flex; gap: 12px; overflow-x: auto; padding-bottom: 6px;
+  scroll-snap-type: x mandatory; scrollbar-width: none; }
+.evx-rail::-webkit-scrollbar { display: none; }
+.evx-rail .evx-card { flex: 0 0 78%; max-width: 320px; scroll-snap-align: start; }
+/* ---- rotating hero carousel (auto-advance + arrows + dots) ---- */
+.carousel { position: relative; border-radius: 18px; overflow: hidden; margin: 0 0 16px;
+  box-shadow: var(--shadow); border: 1px solid var(--line); }
+.cz-track { position: relative; min-height: 200px; }
+.cz-slide { position: absolute; inset: 0; opacity: 0; transform: translateX(16px);
+  transition: opacity .5s ease, transform .5s ease; pointer-events: none;
+  border: 0 !important; border-radius: 0 !important; box-shadow: none !important; margin: 0 !important; }
+.cz-slide.is-active { opacity: 1; transform: none; pointer-events: auto; }
+.carousel .bb-body { padding: 24px 125px 42px 150px; } /* clear the prev arrow (left) and keep text 125px off the next button (right) */
+.cz-arrow { position: absolute; top: 50%; transform: translateY(-50%); z-index: 5;
+  width: 38px; height: 38px; border-radius: 50%; border: 1px solid var(--line);
+  background: rgba(31,25,60,.55); color: #fff; font-size: 1.3rem; line-height: 1; cursor: pointer;
+  display: flex; align-items: center; justify-content: center; transition: background .15s, transform .15s; }
+.cz-arrow:hover { background: rgba(31,25,60,.8); transform: translateY(-50%) scale(1.08); }
+.cz-prev { left: 12px; } .cz-next { right: 12px; }
+.cz-dots { position: absolute; bottom: 12px; left: 0; right: 0; z-index: 5;
+  display: flex; gap: 8px; justify-content: center; }
+.cz-dot { width: 8px; height: 8px; border-radius: 50%; border: 0; cursor: pointer; padding: 0;
+  background: rgba(31,25,60,.28); transition: width .2s ease, background .2s ease; }
+.cz-dot.is-active { width: 22px; border-radius: 4px; background: var(--accent); }
+@media (prefers-reduced-motion: reduce) { .cz-slide { transition: opacity .01s linear; } }
+@media (max-width: 640px) {
+  .cz-track { min-height: 172px; }
+  .carousel .bb-body { padding: 18px 18px 34px 56px; } /* clear the (smaller) mobile prev arrow */
+  .cz-arrow { width: 32px; height: 32px; font-size: 1.1rem; }
+}
+/* ---- reply + export ---- */
+.rv-reply { margin-top: 14px; }
+.rv-copy button { background: var(--surface2) !important; border: 1px solid var(--line) !important; color: var(--text) !important; }
+#rv-export, .ag-export {
+  margin-top: 14px; padding: 12px !important;
+  background: var(--surface) !important; border: 1px solid var(--line) !important;
+  border-radius: 14px !important; box-shadow: var(--shadow);
+}
+#rv-export button, .ag-export button { min-height: 46px !important; font-weight: 600 !important; border-radius: 11px !important; }
+#rv-export .gr-button-primary, #rv-export button.primary,
+.ag-export .gr-button-primary, .ag-export button.primary {
+  background: var(--accent) !important; color: #fff !important; border: none !important;
+}
+/* ---- shared (Activity / Memory) ---- */
+.muted { color: var(--muted); font-style: italic; padding: 8px 2px; }
+.stepper { display: flex; gap: 8px; flex-wrap: wrap; margin: 4px 0 10px; }
+.step { position: relative; padding: 6px 14px; border-radius: 999px; font-size: .8rem; font-weight: 600;
+  color: var(--muted); background: var(--surface); border: 1px solid var(--line); }
+.step:not(:last-child)::after { content: "→"; position: absolute; right: -14px; top: 50%; transform: translateY(-50%); color: var(--muted); }
+.step.active { color: #fff; background: var(--c); border-color: var(--c); animation: pulse 1.4s infinite; }
+@keyframes pulse {
+  0% { box-shadow: 0 0 0 0 color-mix(in srgb, var(--c) 70%, transparent); }
+  70% { box-shadow: 0 0 0 10px transparent; }
+  100% { box-shadow: 0 0 0 0 transparent; }
+}
+.tiles { display: flex; gap: 10px; flex-wrap: wrap; margin: 8px 0 14px; }
+.tile { flex: 1 1 110px; background: var(--surface); border: 1px solid rgba(109,75,224,.18);
+  border-radius: 12px; padding: 12px 14px; text-align: center; box-shadow: 0 3px 12px rgba(45,32,90,.07); }
+.tile-v { font-size: 1.5rem; font-weight: 700; font-family: "Fraunces", serif;
+  background: var(--accent); -webkit-background-clip: text; background-clip: text; -webkit-text-fill-color: transparent; }
+.tile-k { font-size: .72rem; color: var(--muted); text-transform: uppercase; letter-spacing: .04em; }
+.timeline { display: flex; flex-direction: column; gap: 6px; max-height: 360px; overflow-y: auto; flex: 1; }
+.evt { display: grid; grid-template-columns: 84px 1fr auto auto; gap: 10px; align-items: center;
+  padding: 7px 12px; background: var(--surface); border: 1px solid var(--line); border-left: 3px solid var(--c); border-radius: 8px; font-size: .82rem; }
+.evt.err { border-left-color: var(--coral); background: #fdecef; }
+.evt-stage { color: var(--c); font-weight: 700; text-transform: uppercase; font-size: .7rem; filter: brightness(.78); }
+.evt-msg { color: var(--text); }
+.evt-meta, .evt-ts { color: var(--muted); font-family: ui-monospace, monospace; font-size: .72rem; }
+.trace { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 8px 12px; margin-bottom: 6px; }
+.trace summary { cursor: pointer; font-weight: 600; color: var(--text); }
+.trace-line { font-family: ui-monospace, monospace; font-size: .78rem; color: var(--muted); padding: 2px 0 2px 14px; }
+.trace-stage { font-weight: 700; text-transform: uppercase; font-size: .68rem; margin-right: 6px; filter: brightness(.78); }
+.event-card { background: var(--surface); border: 1px solid var(--line); border-radius: 12px; padding: 12px 14px; }
+@keyframes rise { from { opacity: 0; transform: translateY(8px); } to { opacity: 1; transform: none; } }
+/* ---- mobile-first ---- */
+@media (max-width: 640px) {
+  #app-header { font-size: 1.5rem; }
+  #rv-actions { flex-direction: column; }
+  #rv-actions button { width: 100% !important; }
+  .evx-title { font-size: 1.05rem; }
+  .bb { min-height: 168px; }
+  .bb-body { padding: 18px; }
+  .bb-title { font-size: 1.55rem; }
+  .evx-rail .evx-card { flex: 0 0 86%; }
+  .tiles { gap: 8px; }
+  .tile { flex: 1 1 calc(50% - 8px); }
+  .evt { grid-template-columns: 1fr; gap: 2px; }
+  .evt-meta, .evt-ts { display: none; }
+  /* keep the export actions reachable on a phone */
+  #rv-export {
+    position: sticky; bottom: 0; z-index: 20;
+    background: rgba(255,255,255,.94) !important;
+    backdrop-filter: blur(8px);
+    box-shadow: 0 -8px 24px rgba(45,32,90,.18);
+  }
+}
+/* ---- showcase carousel: image-background slides (data-URI SVG illustrations) ---- */
+/* !important so Gradio's own h2/theme heading color can't override the slide text */
+.carousel .bb-img .bb-title, .bb-img .bb-title { color: #ffffff !important; }
+.carousel .bb-img .bb-when, .bb-img .bb-when { color: #ffffff !important; }
+.bb-img .bb-note { color: #e7e2fb !important; }
+.bb-img .bb-kicker { color: #bff2f8 !important; }
+.bb-scrim-dark { background: linear-gradient(to top, rgba(12,8,28,.86) 0%, rgba(12,8,28,.42) 55%, rgba(12,8,28,.15) 100%); }
+.cz-bg-chat { background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA4MDAgNDAwIiBwcmVzZXJ2ZUFzcGVjdFJhdGlvPSJ4TWlkWU1pZCBzbGljZSI+PGRlZnM+PGxpbmVhckdyYWRpZW50IGlkPSJnMSIgeDE9IjAiIHkxPSIwIiB4Mj0iMSIgeTI9IjEiPjxzdG9wIG9mZnNldD0iMCIgc3RvcC1jb2xvcj0iIzNhMmE3MiIvPjxzdG9wIG9mZnNldD0iMSIgc3RvcC1jb2xvcj0iIzBlNWY2ZSIvPjwvbGluZWFyR3JhZGllbnQ+PC9kZWZzPjxyZWN0IHdpZHRoPSI4MDAiIGhlaWdodD0iNDAwIiBmaWxsPSJ1cmwoI2cxKSIvPjxnIGZpbGw9IiNmZmZmZmYiIG9wYWNpdHk9IjAuMTYiPjxyZWN0IHg9IjkwIiB5PSI5MCIgcng9IjIyIiByeT0iMjIiIHdpZHRoPSIyNDAiIGhlaWdodD0iMTIwIi8+PHJlY3QgeD0iMzYwIiB5PSIxNzAiIHJ4PSIyMiIgcnk9IjIyIiB3aWR0aD0iMzAwIiBoZWlnaHQ9IjEyMCIvPjwvZz48ZyBmaWxsPSIjZmZmZmZmIiBvcGFjaXR5PSIwLjUiPjxyZWN0IHg9IjEyMCIgeT0iMTIwIiB3aWR0aD0iMTUwIiBoZWlnaHQ9IjE0IiByeD0iNyIvPjxyZWN0IHg9IjEyMCIgeT0iMTUwIiB3aWR0aD0iMTEwIiBoZWlnaHQ9IjE0IiByeD0iNyIvPjxyZWN0IHg9IjM5MCIgeT0iMjAwIiB3aWR0aD0iMjAwIiBoZWlnaHQ9IjE0IiByeD0iNyIvPjxyZWN0IHg9IjM5MCIgeT0iMjMwIiB3aWR0aD0iMTUwIiBoZWlnaHQ9IjE0IiByeD0iNyIvPjwvZz48ZyB0cmFuc2Zvcm09InRyYW5zbGF0ZSg2MDAsNzApIiBvcGFjaXR5PSIwLjg1Ij48cmVjdCB3aWR0aD0iMTEwIiBoZWlnaHQ9IjEwMCIgcng9IjE0IiBmaWxsPSIjNTRkOGUyIi8+PHJlY3Qgd2lkdGg9IjExMCIgaGVpZ2h0PSIyNiIgcng9IjE0IiBmaWxsPSIjMGI4Mjk0Ii8+PGcgZmlsbD0iIzBlMjIzMCI+PHJlY3QgeD0iMTgiIHk9IjQ2IiB3aWR0aD0iMTgiIGhlaWdodD0iMTgiIHJ4PSIzIi8+PHJlY3QgeD0iNDYiIHk9IjQ2IiB3aWR0aD0iMTgiIGhlaWdodD0iMTgiIHJ4PSIzIi8+PHJlY3QgeD0iNzQiIHk9IjQ2IiB3aWR0aD0iMTgiIGhlaWdodD0iMTgiIHJ4PSIzIi8+PHJlY3QgeD0iMTgiIHk9IjcyIiB3aWR0aD0iMTgiIGhlaWdodD0iMTgiIHJ4PSIzIi8+PC9nPjwvZz48L3N2Zz4="); background-size: cover; background-position: center; }
+.cz-bg-flyer { background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA4MDAgNDAwIiBwcmVzZXJ2ZUFzcGVjdFJhdGlvPSJ4TWlkWU1pZCBzbGljZSI+PGRlZnM+PGxpbmVhckdyYWRpZW50IGlkPSJnMiIgeDE9IjAiIHkxPSIwIiB4Mj0iMSIgeTI9IjEiPjxzdG9wIG9mZnNldD0iMCIgc3RvcC1jb2xvcj0iIzViMmE4NiIvPjxzdG9wIG9mZnNldD0iMSIgc3RvcC1jb2xvcj0iIzBlNmY3ZSIvPjwvbGluZWFyR3JhZGllbnQ+PC9kZWZzPjxyZWN0IHdpZHRoPSI4MDAiIGhlaWdodD0iNDAwIiBmaWxsPSJ1cmwoI2cyKSIvPjxnIHRyYW5zZm9ybT0icm90YXRlKC04IDMwMCAyMDApIj48cmVjdCB4PSIxNTAiIHk9IjcwIiB3aWR0aD0iMjYwIiBoZWlnaHQ9IjI2MCIgcng9IjE2IiBmaWxsPSIjZmZmZmZmIiBvcGFjaXR5PSIwLjkyIi8+PHJlY3QgeD0iMTgwIiB5PSIxMDAiIHdpZHRoPSIyMDAiIGhlaWdodD0iNzAiIHJ4PSI4IiBmaWxsPSIjNmQ0YmUwIiBvcGFjaXR5PSIwLjg1Ii8+PGcgZmlsbD0iIzlhOTNiOCI+PHJlY3QgeD0iMTgwIiB5PSIxOTAiIHdpZHRoPSIyMDAiIGhlaWdodD0iMTIiIHJ4PSI2Ii8+PHJlY3QgeD0iMTgwIiB5PSIyMTQiIHdpZHRoPSIxNjAiIGhlaWdodD0iMTIiIHJ4PSI2Ii8+PHJlY3QgeD0iMTgwIiB5PSIyMzgiIHdpZHRoPSIxODAiIGhlaWdodD0iMTIiIHJ4PSI2Ii8+PC9nPjxyZWN0IHg9IjE4MCIgeT0iMjc4IiB3aWR0aD0iMTIwIiBoZWlnaHQ9IjI2IiByeD0iMTMiIGZpbGw9IiMwYjgyOTQiLz48L2c+PGcgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoNTIwLDE1MCkiPjxyZWN0IHdpZHRoPSIxNzAiIGhlaWdodD0iMTMwIiByeD0iMTgiIGZpbGw9IiMwZTIyMzAiIG9wYWNpdHk9IjAuOSIvPjxjaXJjbGUgY3g9Ijg1IiBjeT0iNzAiIHI9IjQyIiBmaWxsPSJub25lIiBzdHJva2U9IiM1NGQ4ZTIiIHN0cm9rZS13aWR0aD0iOCIvPjxjaXJjbGUgY3g9Ijg1IiBjeT0iNzAiIHI9IjE4IiBmaWxsPSIjNTRkOGUyIi8+PHJlY3QgeD0iMTIwIiB5PSIyMCIgd2lkdGg9IjM0IiBoZWlnaHQ9IjE0IiByeD0iNyIgZmlsbD0iIzU0ZDhlMiIvPjwvZz48L3N2Zz4="); background-size: cover; background-position: center; }
+.cz-bg-cal { background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA4MDAgNDAwIiBwcmVzZXJ2ZUFzcGVjdFJhdGlvPSJ4TWlkWU1pZCBzbGljZSI+PGRlZnM+PGxpbmVhckdyYWRpZW50IGlkPSJnMyIgeDE9IjAiIHkxPSIwIiB4Mj0iMSIgeTI9IjEiPjxzdG9wIG9mZnNldD0iMCIgc3RvcC1jb2xvcj0iIzJmMmE3OCIvPjxzdG9wIG9mZnNldD0iMSIgc3RvcC1jb2xvcj0iIzBlNjQ3MCIvPjwvbGluZWFyR3JhZGllbnQ+PC9kZWZzPjxyZWN0IHdpZHRoPSI4MDAiIGhlaWdodD0iNDAwIiBmaWxsPSJ1cmwoI2czKSIvPjxnIHRyYW5zZm9ybT0idHJhbnNsYXRlKDI1MCw3MCkiPjxyZWN0IHdpZHRoPSIzMDAiIGhlaWdodD0iMjYwIiByeD0iMTgiIGZpbGw9IiNmZmZmZmYiIG9wYWNpdHk9IjAuOTUiLz48cmVjdCB3aWR0aD0iMzAwIiBoZWlnaHQ9IjU0IiByeD0iMTgiIGZpbGw9IiM2ZDRiZTAiLz48cmVjdCB4PSIyMCIgeT0iODAiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiNlN2UzZjUiLz48cmVjdCB4PSI2NiIgeT0iODAiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiNlN2UzZjUiLz48cmVjdCB4PSIxMTIiIHk9IjgwIiB3aWR0aD0iMzQiIGhlaWdodD0iMzQiIHJ4PSI2IiBmaWxsPSIjZTdlM2Y1Ii8+PHJlY3QgeD0iMTU4IiB5PSI4MCIgd2lkdGg9IjM0IiBoZWlnaHQ9IjM0IiByeD0iNiIgZmlsbD0iI2U3ZTNmNSIvPjxyZWN0IHg9IjIwNCIgeT0iODAiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiNlN2UzZjUiLz48cmVjdCB4PSIyNTAiIHk9IjgwIiB3aWR0aD0iMzQiIGhlaWdodD0iMzQiIHJ4PSI2IiBmaWxsPSIjZTdlM2Y1Ii8+PHJlY3QgeD0iMjAiIHk9IjEyNiIgd2lkdGg9IjM0IiBoZWlnaHQ9IjM0IiByeD0iNiIgZmlsbD0iI2U3ZTNmNSIvPjxyZWN0IHg9IjY2IiB5PSIxMjYiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiNlN2UzZjUiLz48cmVjdCB4PSIxMTIiIHk9IjEyNiIgd2lkdGg9IjM0IiBoZWlnaHQ9IjM0IiByeD0iNiIgZmlsbD0iI2U3ZTNmNSIvPjxyZWN0IHg9IjE1OCIgeT0iMTI2IiB3aWR0aD0iMzQiIGhlaWdodD0iMzQiIHJ4PSI2IiBmaWxsPSIjZTdlM2Y1Ii8+PHJlY3QgeD0iMjA0IiB5PSIxMjYiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiNlN2UzZjUiLz48cmVjdCB4PSIyNTAiIHk9IjEyNiIgd2lkdGg9IjM0IiBoZWlnaHQ9IjM0IiByeD0iNiIgZmlsbD0iI2U3ZTNmNSIvPjxyZWN0IHg9IjIwIiB5PSIxNzIiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiNlN2UzZjUiLz48cmVjdCB4PSI2NiIgeT0iMTcyIiB3aWR0aD0iMzQiIGhlaWdodD0iMzQiIHJ4PSI2IiBmaWxsPSIjZTdlM2Y1Ii8+PHJlY3QgeD0iMTEyIiB5PSIxNzIiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiNlN2UzZjUiLz48cmVjdCB4PSIxNTgiIHk9IjE3MiIgd2lkdGg9IjM0IiBoZWlnaHQ9IjM0IiByeD0iNiIgZmlsbD0iI2U3ZTNmNSIvPjxyZWN0IHg9IjIwNCIgeT0iMTcyIiB3aWR0aD0iMzQiIGhlaWdodD0iMzQiIHJ4PSI2IiBmaWxsPSIjZTdlM2Y1Ii8+PHJlY3QgeD0iMjUwIiB5PSIxNzIiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiNlN2UzZjUiLz48cmVjdCB4PSIyMCIgeT0iMjE4IiB3aWR0aD0iMzQiIGhlaWdodD0iMzQiIHJ4PSI2IiBmaWxsPSIjZTdlM2Y1Ii8+PHJlY3QgeD0iNjYiIHk9IjIxOCIgd2lkdGg9IjM0IiBoZWlnaHQ9IjM0IiByeD0iNiIgZmlsbD0iI2U3ZTNmNSIvPjxyZWN0IHg9IjExMiIgeT0iMjE4IiB3aWR0aD0iMzQiIGhlaWdodD0iMzQiIHJ4PSI2IiBmaWxsPSIjZTdlM2Y1Ii8+PHJlY3QgeD0iMTU4IiB5PSIyMTgiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiNlN2UzZjUiLz48cmVjdCB4PSIyMDQiIHk9IjIxOCIgd2lkdGg9IjM0IiBoZWlnaHQ9IjM0IiByeD0iNiIgZmlsbD0iI2U3ZTNmNSIvPjxyZWN0IHg9IjI1MCIgeT0iMjE4IiB3aWR0aD0iMzQiIGhlaWdodD0iMzQiIHJ4PSI2IiBmaWxsPSIjZTdlM2Y1Ii8+PHJlY3QgeD0iMTU4IiB5PSIxNzIiIHdpZHRoPSIzNCIgaGVpZ2h0PSIzNCIgcng9IjYiIGZpbGw9IiMwYjgyOTQiLz48Y2lyY2xlIGN4PSIxNzUiIGN5PSIxODkiIHI9IjgiIGZpbGw9IiNmZmZmZmYiLz48L2c+PC9zdmc+"); background-size: cover; background-position: center; }
+.cz-bg-reply { background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA4MDAgNDAwIiBwcmVzZXJ2ZUFzcGVjdFJhdGlvPSJ4TWlkWU1pZCBzbGljZSI+PGRlZnM+PGxpbmVhckdyYWRpZW50IGlkPSJnNCIgeDE9IjAiIHkxPSIwIiB4Mj0iMSIgeTI9IjEiPjxzdG9wIG9mZnNldD0iMCIgc3RvcC1jb2xvcj0iIzRhMmE4MCIvPjxzdG9wIG9mZnNldD0iMSIgc3RvcC1jb2xvcj0iIzBlNWY2ZSIvPjwvbGluZWFyR3JhZGllbnQ+PC9kZWZzPjxyZWN0IHdpZHRoPSI4MDAiIGhlaWdodD0iNDAwIiBmaWxsPSJ1cmwoI2c0KSIvPjxnIHRyYW5zZm9ybT0idHJhbnNsYXRlKDE1MCwxMjApIj48cmVjdCB3aWR0aD0iMzIwIiBoZWlnaHQ9IjE1MCIgcng9IjI2IiBmaWxsPSIjZmZmZmZmIiBvcGFjaXR5PSIwLjk1Ii8+PHBhdGggZD0iTTYwIDE1MCBsMCA1MCBsNTAgLTUwIHoiIGZpbGw9IiNmZmZmZmYiIG9wYWNpdHk9IjAuOTUiLz48ZyBmaWxsPSIjOWE5M2I4Ij48cmVjdCB4PSI0MCIgeT0iNDQiIHdpZHRoPSIyNDAiIGhlaWdodD0iMTQiIHJ4PSI3Ii8+PHJlY3QgeD0iNDAiIHk9Ijc2IiB3aWR0aD0iMTgwIiBoZWlnaHQ9IjE0IiByeD0iNyIvPjwvZz48L2c+PGcgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoNTQwLDE1MCkiPjxjaXJjbGUgY3g9IjYwIiBjeT0iNjAiIHI9IjYwIiBmaWxsPSIjMTViMDcwIi8+PHBhdGggZD0iTTMyIDYyIGwyMCAyMCBsNDAgLTQ0IiBmaWxsPSJub25lIiBzdHJva2U9IiNmZmZmZmYiIHN0cm9rZS13aWR0aD0iMTIiIHN0cm9rZS1saW5lY2FwPSJyb3VuZCIgc3Ryb2tlLWxpbmVqb2luPSJyb3VuZCIvPjwvZz48L3N2Zz4="); background-size: cover; background-position: center; }
+.cz-bg-carpool { background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA4MDAgNDAwIiBwcmVzZXJ2ZUFzcGVjdFJhdGlvPSJ4TWlkWU1pZCBzbGljZSI+PGRlZnM+PGxpbmVhckdyYWRpZW50IGlkPSJnYyIgeDE9IjAiIHkxPSIwIiB4Mj0iMSIgeTI9IjEiPjxzdG9wIG9mZnNldD0iMCIgc3RvcC1jb2xvcj0iIzNhMmE3MiIvPjxzdG9wIG9mZnNldD0iMSIgc3RvcC1jb2xvcj0iIzBlNjQ3MCIvPjwvbGluZWFyR3JhZGllbnQ+PC9kZWZzPjxyZWN0IHdpZHRoPSI4MDAiIGhlaWdodD0iNDAwIiBmaWxsPSJ1cmwoI2djKSIvPjxyZWN0IHg9IjAiIHk9IjMwMCIgd2lkdGg9IjgwMCIgaGVpZ2h0PSIxMCIgZmlsbD0iI2ZmZmZmZiIgb3BhY2l0eT0iMC4zIi8+PGcgZmlsbD0iI2ZmZmZmZiIgb3BhY2l0eT0iMC41Ij48cmVjdCB4PSI2MCIgeT0iMzAxIiB3aWR0aD0iNjAiIGhlaWdodD0iNiIvPjxyZWN0IHg9IjE4MCIgeT0iMzAxIiB3aWR0aD0iNjAiIGhlaWdodD0iNiIvPjxyZWN0IHg9IjMwMCIgeT0iMzAxIiB3aWR0aD0iNjAiIGhlaWdodD0iNiIvPjxyZWN0IHg9IjQyMCIgeT0iMzAxIiB3aWR0aD0iNjAiIGhlaWdodD0iNiIvPjwvZz48ZyB0cmFuc2Zvcm09InRyYW5zbGF0ZSgxMjAsMTUwKSI+PHJlY3QgeD0iMCIgeT0iNzAiIHdpZHRoPSIyNjAiIGhlaWdodD0iNzIiIHJ4PSIyNCIgZmlsbD0iIzU0ZDhlMiIvPjxwYXRoIGQ9Ik00NCA3MiBxMjggLTU2IDkyIC01NiBsNjQgMCBxNDQgMCA2NCA1NiB6IiBmaWxsPSIjZmZmZmZmIiBvcGFjaXR5PSIwLjkyIi8+PHJlY3QgeD0iNzAiIHk9IjMwIiB3aWR0aD0iNjAiIGhlaWdodD0iNDAiIHJ4PSI2IiBmaWxsPSIjNmQ0YmUwIiBvcGFjaXR5PSIwLjg1Ii8+PHJlY3QgeD0iMTUwIiB5PSIzMCIgd2lkdGg9IjYwIiBoZWlnaHQ9IjQwIiByeD0iNiIgZmlsbD0iIzZkNGJlMCIgb3BhY2l0eT0iMC44NSIvPjxjaXJjbGUgY3g9Ijc0IiBjeT0iMTQ4IiByPSIyOCIgZmlsbD0iIzBlMjIzMCIvPjxjaXJjbGUgY3g9Ijc0IiBjeT0iMTQ4IiByPSIxMyIgZmlsbD0iIzU0ZDhlMiIvPjxjaXJjbGUgY3g9IjIwNiIgY3k9IjE0OCIgcj0iMjgiIGZpbGw9IiMwZTIyMzAiLz48Y2lyY2xlIGN4PSIyMDYiIGN5PSIxNDgiIHI9IjEzIiBmaWxsPSIjNTRkOGUyIi8+PC9nPjxnIHRyYW5zZm9ybT0idHJhbnNsYXRlKDU2MCw5MCkiIGZpbGw9Im5vbmUiIHN0cm9rZT0iIzU0ZDhlMiIgc3Ryb2tlLXdpZHRoPSIxMCIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIj48Y2lyY2xlIGN4PSI2MCIgY3k9IjYwIiByPSI0MCIgc3Ryb2tlPSIjZmZmZmZmIiBvcGFjaXR5PSIwLjg1Ii8+PHBhdGggZD0iTTYwIDM2IGwwIDI0IGwxNiAxMCIgc3Ryb2tlPSIjNTRkOGUyIi8+PC9nPjwvc3ZnPg=="); background-size: cover; background-position: center; }
+.cz-bg-appt { background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA4MDAgNDAwIiBwcmVzZXJ2ZUFzcGVjdFJhdGlvPSJ4TWlkWU1pZCBzbGljZSI+PGRlZnM+PGxpbmVhckdyYWRpZW50IGlkPSJnYSIgeDE9IjAiIHkxPSIwIiB4Mj0iMSIgeTI9IjEiPjxzdG9wIG9mZnNldD0iMCIgc3RvcC1jb2xvcj0iIzJmMmE3OCIvPjxzdG9wIG9mZnNldD0iMSIgc3RvcC1jb2xvcj0iIzBlNjQ3MCIvPjwvbGluZWFyR3JhZGllbnQ+PC9kZWZzPjxyZWN0IHdpZHRoPSI4MDAiIGhlaWdodD0iNDAwIiBmaWxsPSJ1cmwoI2dhKSIvPjxnIHRyYW5zZm9ybT0idHJhbnNsYXRlKDQ0MCw5MCkiPjxyZWN0IHdpZHRoPSIyNjAiIGhlaWdodD0iMjIwIiByeD0iMTYiIGZpbGw9IiNmZmZmZmYiIG9wYWNpdHk9IjAuOTUiLz48cmVjdCB3aWR0aD0iMjYwIiBoZWlnaHQ9IjUwIiByeD0iMTYiIGZpbGw9IiM2ZDRiZTAiLz48ZyBmaWxsPSIjZTdlM2Y1Ij48cmVjdCB4PSIyMiIgeT0iNzgiIHdpZHRoPSI0MCIgaGVpZ2h0PSI0MCIgcng9IjYiLz48cmVjdCB4PSI3OCIgeT0iNzgiIHdpZHRoPSI0MCIgaGVpZ2h0PSI0MCIgcng9IjYiLz48cmVjdCB4PSIxMzQiIHk9Ijc4IiB3aWR0aD0iNDAiIGhlaWdodD0iNDAiIHJ4PSI2Ii8+PHJlY3QgeD0iMTkwIiB5PSI3OCIgd2lkdGg9IjQwIiBoZWlnaHQ9IjQwIiByeD0iNiIvPjxyZWN0IHg9IjIyIiB5PSIxMzAiIHdpZHRoPSI0MCIgaGVpZ2h0PSI0MCIgcng9IjYiLz48cmVjdCB4PSI3OCIgeT0iMTMwIiB3aWR0aD0iNDAiIGhlaWdodD0iNDAiIHJ4PSI2Ii8+PHJlY3QgeD0iMTkwIiB5PSIxMzAiIHdpZHRoPSI0MCIgaGVpZ2h0PSI0MCIgcng9IjYiLz48L2c+PHJlY3QgeD0iMTM0IiB5PSIxMzAiIHdpZHRoPSI0MCIgaGVpZ2h0PSI0MCIgcng9IjYiIGZpbGw9IiMwYjgyOTQiLz48Y2lyY2xlIGN4PSIxNTQiIGN5PSIxNTAiIHI9IjkiIGZpbGw9IiNmZmZmZmYiLz48L2c+PGcgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoMTIwLDExMCkiPjxjaXJjbGUgY3g9Ijk1IiBjeT0iOTUiIHI9Ijk1IiBmaWxsPSIjMGUyMjMwIiBvcGFjaXR5PSIwLjkyIi8+PGNpcmNsZSBjeD0iOTUiIGN5PSI5NSIgcj0iNzgiIGZpbGw9Im5vbmUiIHN0cm9rZT0iIzU0ZDhlMiIgc3Ryb2tlLXdpZHRoPSI5Ii8+PHBhdGggZD0iTTk1IDk1IEw5NSA0NiIgc3Ryb2tlPSIjNTRkOGUyIiBzdHJva2Utd2lkdGg9IjExIiBzdHJva2UtbGluZWNhcD0icm91bmQiLz48cGF0aCBkPSJNOTUgOTUgTDEzNiAxMTIiIHN0cm9rZT0iI2ZmZmZmZiIgc3Ryb2tlLXdpZHRoPSI4IiBzdHJva2UtbGluZWNhcD0icm91bmQiLz48L2c+PC9zdmc+"); background-size: cover; background-position: center; }
+.cz-bg-party { background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA4MDAgNDAwIiBwcmVzZXJ2ZUFzcGVjdFJhdGlvPSJ4TWlkWU1pZCBzbGljZSI+PGRlZnM+PGxpbmVhckdyYWRpZW50IGlkPSJncCIgeDE9IjAiIHkxPSIwIiB4Mj0iMSIgeTI9IjEiPjxzdG9wIG9mZnNldD0iMCIgc3RvcC1jb2xvcj0iIzViMmE4NiIvPjxzdG9wIG9mZnNldD0iMSIgc3RvcC1jb2xvcj0iIzBlNmY3ZSIvPjwvbGluZWFyR3JhZGllbnQ+PC9kZWZzPjxyZWN0IHdpZHRoPSI4MDAiIGhlaWdodD0iNDAwIiBmaWxsPSJ1cmwoI2dwKSIvPjxnIG9wYWNpdHk9IjAuNzUiPjxyZWN0IHg9IjEyMCIgeT0iNTAiIHdpZHRoPSIxNiIgaGVpZ2h0PSIxNiIgcng9IjMiIGZpbGw9IiM1NGQ4ZTIiIHRyYW5zZm9ybT0icm90YXRlKDIwIDEyOCA1OCkiLz48cmVjdCB4PSIzMDAiIHk9IjQwIiB3aWR0aD0iMTQiIGhlaWdodD0iMTQiIHJ4PSIzIiBmaWxsPSIjZmZmZmZmIiB0cmFuc2Zvcm09InJvdGF0ZSgtMTUgMzA3IDQ3KSIvPjxyZWN0IHg9IjY0MCIgeT0iNjAiIHdpZHRoPSIxNiIgaGVpZ2h0PSIxNiIgcng9IjMiIGZpbGw9IiM1NGQ4ZTIiIHRyYW5zZm9ybT0icm90YXRlKDMwIDY0OCA2OCkiLz48Y2lyY2xlIGN4PSI1MjAiIGN5PSI1MCIgcj0iOCIgZmlsbD0iI2ZmZmZmZiIvPjxjaXJjbGUgY3g9IjIyMCIgY3k9IjM0MCIgcj0iOCIgZmlsbD0iIzU0ZDhlMiIvPjxyZWN0IHg9IjcwMCIgeT0iMzIwIiB3aWR0aD0iMTQiIGhlaWdodD0iMTQiIHJ4PSIzIiBmaWxsPSIjZmZmZmZmIiB0cmFuc2Zvcm09InJvdGF0ZSgyNSA3MDcgMzI3KSIvPjwvZz48ZyB0cmFuc2Zvcm09InRyYW5zbGF0ZSgxNTAsODApIj48ZWxsaXBzZSBjeD0iNzAiIGN5PSI5MCIgcng9IjY0IiByeT0iNzgiIGZpbGw9IiM1NGQ4ZTIiLz48cGF0aCBkPSJNNzAgMTY4IGwtOSAyMCBsMTggMCB6IiBmaWxsPSIjNTRkOGUyIi8+PHBhdGggZD0iTTcwIDE4OCBxMjQgMzQgLTEyIDY2IiBzdHJva2U9IiNmZmZmZmYiIHN0cm9rZS13aWR0aD0iNCIgZmlsbD0ibm9uZSIgb3BhY2l0eT0iMC43Ii8+PC9nPjxnIHRyYW5zZm9ybT0idHJhbnNsYXRlKDQ3MCwxNzApIj48cmVjdCB3aWR0aD0iMTkwIiBoZWlnaHQ9IjE1MCIgcng9IjE0IiBmaWxsPSIjZmZmZmZmIiBvcGFjaXR5PSIwLjk1Ii8+PHJlY3QgeD0iODIiIHdpZHRoPSIyNiIgaGVpZ2h0PSIxNTAiIGZpbGw9IiM2ZDRiZTAiLz48cmVjdCB5PSI1OCIgd2lkdGg9IjE5MCIgaGVpZ2h0PSIyNiIgZmlsbD0iIzZkNGJlMCIvPjxwYXRoIGQ9Ik05NSA1OCBxLTQ2IC01NCAtMTQgLTU0IHEzNCAwIDE0IDU0IHEyMCAtNTQgNTQgLTU0IHEzNCAwIC0xNCA1NCB6IiBmaWxsPSIjMGI4Mjk0Ii8+PC9nPjwvc3ZnPg=="); background-size: cover; background-position: center; }
+/* ===================================================================== */
+/* Landing-page remaster — bold dark hero + light body                   */
+/* ===================================================================== */
+html { overflow-x: hidden; scroll-behavior: smooth; }
+.gradio-container { scroll-behavior: smooth; }
+/* ---- sticky top nav (full-bleed) ---- */
+#site-nav { width: 100vw; margin-left: calc(50% - 50vw); position: sticky; top: 0; z-index: 60;
+  background: rgba(255,255,255,.82); backdrop-filter: blur(12px) saturate(1.2);
+  border-bottom: 1px solid var(--line); }
+#site-nav .nav-inner { max-width: 1100px; margin: 0 auto; padding: 12px 22px;
+  display: flex; align-items: center; justify-content: space-between; gap: 16px; }
+.nav-brand { display: inline-flex; align-items: center; gap: 9px;
+  font-family: "Fraunces", serif; font-weight: 700; font-size: 1.16rem; text-decoration: none; }
+/* plain ink in the site display typeface — the gradient text-clip trick left
+   the title invisible whenever the clip didn't apply */
+.nav-brand span { color: var(--text) !important; -webkit-text-fill-color: currentColor; }
+.nav-logo { width: 30px; height: 30px; object-fit: contain; flex: none; }
+/* calendar-option notes inside the step-2 dropdown */
+.cal-note { color: var(--muted); font-size: .86rem; margin-bottom: 6px; }
+.cal-note code { background: var(--surface2); padding: 1px 5px; border-radius: 5px; }
+.nav-links { display: flex; align-items: center; gap: 18px; }
+/* grouped dropdowns */
+.nav-group { position: relative; }
+.nav-top { background: none; border: 0; cursor: pointer; color: var(--muted); font-weight: 600;
+  font-size: .92rem; padding: 6px 4px; display: inline-flex; align-items: center; gap: 6px;
+  font-family: inherit; transition: color .15s; }
+.nav-group:hover .nav-top, .nav-group:focus-within .nav-top, .nav-top:hover { color: var(--text); }
+.nav-caret { font-size: .7rem; opacity: .7; transition: transform .15s; }
+.nav-group:hover .nav-caret, .nav-group:focus-within .nav-caret { transform: rotate(180deg); }
+.nav-menu { position: absolute; top: calc(100% + 8px); left: 0; min-width: 190px;
+  background: #fff; border: 1px solid var(--line); border-radius: 12px; box-shadow: var(--shadow);
+  padding: 8px; display: none; flex-direction: column; gap: 2px; z-index: 70; }
+.nav-group:hover .nav-menu, .nav-group:focus-within .nav-menu { display: flex; }
+.nav-item { display: block; padding: 8px 12px; border-radius: 8px; color: var(--text) !important;
+  text-decoration: none; font-weight: 600; font-size: .92rem; cursor: pointer; white-space: nowrap; }
+.nav-item:hover { background: var(--surface2); }
+.nav-cta { background: var(--accent); color: #fff !important; padding: 8px 18px; border-radius: 999px;
+  text-decoration: none; font-weight: 700; font-size: .92rem; box-shadow: 0 6px 18px rgba(109,75,224,.35);
+  transition: filter .15s; }
+.nav-cta:hover { filter: brightness(1.06); }
+/* Hide the default Gradio tab strip — the banner is the only navigation now.
+   IMPORTANT: don't use display:none / width:0 — Gradio 6's tab bar is responsive
+   (measures its width and overflows tabs that don't fit into a "more" menu as a
+   non-interactive clone the nav JS can't click). Collapsing it while keeping its
+   NATURAL width broke on mobile: a ~390px phone viewport can't fit 7 tabs, so the
+   later ones (Memory/Feed/Submission, index >= 3) overflowed into that dead clone
+   and their banner links did nothing — while desktop's ~1100px fit them all.
+   Fix: park the strip far off-screen at a FIXED LARGE width so the responsive
+   measurement always sees room for every tab on one row, no overflow menu is
+   ever built, and all tab buttons stay real and programmatically clickable. */
+.gradio-container .tab-wrapper,
+.gradio-container .tab-nav {
+  height: 0 !important; min-height: 0 !important; overflow: hidden !important;
+  opacity: 0 !important; margin: 0 !important; padding: 0 !important; border: 0 !important;
+  pointer-events: none;
+  /* KEEP the strip from ever overflowing tabs into the dead "more" clone:
+     force its measured width far past the tab count and forbid wrapping, so on
+     a phone all 7 tabs still "fit" and stay real, clickable buttons. height:0 +
+     overflow:hidden keeps it invisible; html{overflow-x:hidden} clips the
+     oversized width so it can't add a horizontal scrollbar. */
+  min-width: 1600px !important; max-width: none !important; flex-wrap: nowrap !important;
+}
+/* ---- hero (full-bleed DARK band, split: copy + example-card grid) ---- */
+#hero { width: 100vw; margin-left: calc(50% - 50vw);
+  padding: 64px max(22px, calc(50vw - 560px)) 104px;  /* extra bottom so the tool card overlaps */
+  display: flex !important; flex-wrap: nowrap; align-items: center; gap: 40px !important;
+  background:
+    radial-gradient(900px 520px at 82% -12%, rgba(109,75,224,.42) 0%, transparent 60%),
+    radial-gradient(760px 520px at -5% 115%, rgba(11,130,148,.34) 0%, transparent 55%),
+    linear-gradient(160deg, #0e0b1c 0%, #1a1230 58%, #0e1622 100%) !important; }
+/* inner Gradio wrappers transparent so the dark band shows (NOT #hero itself) */
+#hero-left, #hero-right,
+#hero .block, #hero .form, #hero .gr-group, #hero .gradio-html, #hero > * {
+  background: transparent !important; border: 0 !important; box-shadow: none !important; }
+#hero-left { flex: 1 1 520px; min-width: 0; }
+#hero-right { flex: 0 0 380px; max-width: 400px; }
+.hero-copy { max-width: 620px; animation: rise .5s ease both; }
+.hero-eyebrow { color: #bff2f8 !important; font-weight: 700; letter-spacing: .14em; text-transform: uppercase;
+  font-size: .8rem; margin-bottom: 14px; }
+.hero-title { font-family: "Fraunces", serif; color: #fff !important; font-size: clamp(2.2rem, 5vw, 4rem);
+  line-height: 1.03; letter-spacing: -.02em; margin: 0 0 18px; }
+.hero-accent { background: linear-gradient(100deg, #b39bff, #5fd5e6);
+  -webkit-background-clip: text; background-clip: text; -webkit-text-fill-color: transparent; }
+.hero-sub { color: #ded8f0 !important; font-size: clamp(1rem, 1.5vw, 1.18rem); line-height: 1.55; max-width: 560px; margin: 0 0 12px; }
+.hero-trust { color: #b3add0 !important; font-size: .9rem; margin: 0; }
+#hero-cta { gap: 14px !important; align-items: center; margin-top: 20px; flex-wrap: wrap; }
+#hero-cta #hero-try { flex: 0 0 auto; }
+#hero-try button { background: var(--accent) !important; color: #fff !important; border: none !important;
+  font-weight: 700 !important; border-radius: 999px !important; padding: 13px 28px !important; min-height: 0 !important;
+  box-shadow: 0 12px 32px rgba(109,75,224,.46) !important; transition: transform .15s, filter .15s !important; }
+#hero-try button:hover { transform: translateY(-2px); filter: brightness(1.07); }
+.hero-ghost { color: #fff; text-decoration: none; font-weight: 600; border: 1px solid rgba(255,255,255,.42);
+  padding: 12px 22px; border-radius: 999px; transition: background .15s; white-space: nowrap; }
+.hero-ghost:hover { background: rgba(255,255,255,.12); }
+/* hero example-card grid (chat -> event), echoes the reference's message cards */
+.hx-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 12px; }
+.hx-card { background: rgba(255,255,255,.06); border: 1px solid rgba(255,255,255,.12);
+  border-radius: 12px; padding: 12px; box-shadow: 0 10px 24px rgba(0,0,0,.25); animation: rise .5s ease both; }
+.hx-from { color: #8f88b5; font-size: .68rem; font-weight: 700; text-transform: uppercase; letter-spacing: .07em; margin-bottom: 5px; }
+.hx-chat { color: #ece9f8; font-size: .84rem; line-height: 1.35; margin-bottom: 8px; }
+.hx-event { background: #fff; color: #1e1934; border-radius: 8px; padding: 6px 9px; font-size: .78rem;
+  font-weight: 700; border-left: 3px solid #6d4be0; }
+.hx-event + .hx-event { margin-top: 6px; }   /* rule 5: deadline + event entries */
+.hx-assumed { color: #8f88b5; font-size: .85em; font-weight: 600; }  /* rule 2: inference flag */
+/* ---- light marketing sections ---- */
+.lp-section { max-width: 1080px; margin: 0 auto; padding: 78px 22px 6px; text-align: center; scroll-margin-top: 80px; }
+.lp-eyebrow { color: var(--cyan); font-weight: 700; letter-spacing: .14em; text-transform: uppercase; font-size: .8rem; margin-bottom: 10px; }
+.lp-title { font-family: "Fraunces", serif; color: var(--text); font-size: clamp(1.7rem, 3.4vw, 2.6rem);
+  line-height: 1.1; letter-spacing: -.01em; margin: 0 auto 34px; max-width: 780px; }
+.lp-grid { display: grid; gap: 18px; text-align: left; }
+.lp-grid-3 { grid-template-columns: repeat(3, 1fr); }
+.lp-card, .lp-step, .lp-priv { background: var(--surface); border: 1px solid var(--line); border-radius: 16px;
+  padding: 24px; box-shadow: 0 6px 20px rgba(45,32,90,.07); transition: transform .18s, box-shadow .18s; }
+.lp-card:hover, .lp-step:hover, .lp-priv:hover { transform: translateY(-4px); box-shadow: 0 16px 34px rgba(45,32,90,.16); }
+.lp-ico { font-size: 1.9rem; margin-bottom: 10px; line-height: 1; }
+.lp-step-n { font-family: "Fraunces", serif; font-size: 1.7rem; font-weight: 700; margin-bottom: 8px;
+  background: var(--accent); -webkit-background-clip: text; background-clip: text; -webkit-text-fill-color: transparent; }
+.lp-card-t { font-size: 1.12rem; font-weight: 700; color: var(--text); margin: 0 0 8px; }
+.lp-card-d { color: var(--muted); font-size: .96rem; line-height: 1.55; margin: 0; }
+.lp-card-d code { background: var(--surface2); padding: 1px 6px; border-radius: 6px; font-size: .86em; }
+.lp-tool-head { padding-bottom: 2px; }
+/* tool anchor offset for the sticky nav */
+.tool-anchor { scroll-margin-top: 84px; }
+#rv-results { scroll-margin-top: 84px; }
+/* ---- footer (full-bleed dark band) ---- */
+#site-footer { width: 100vw; margin-left: calc(50% - 50vw); margin-top: 66px;
+  background: linear-gradient(160deg, #0e0b1c 0%, #181030 100%); color: #fff; }
+.footer-inner { max-width: 1080px; margin: 0 auto; padding: 58px 22px; text-align: center; }
+/* !important: the global .prose h2 readability rule (top of file) would
+   otherwise repaint this ink-dark on the dark band. */
+.footer-cta-t, #site-footer .footer-cta-t { font-family: "Fraunces", serif; color: #fff !important;
+  font-size: clamp(1.6rem, 3vw, 2.4rem); margin: 0 0 18px; }
+a.footer-cta { display: inline-block; background: var(--accent); color: #fff; text-decoration: none; font-weight: 700;
+  padding: 13px 28px; border-radius: 999px; box-shadow: 0 12px 32px rgba(109,75,224,.42); transition: transform .15s, filter .15s; }
+a.footer-cta:hover { transform: translateY(-2px); filter: brightness(1.07); }
+.footer-meta { color: #9b93c2; font-size: .86rem; margin-top: 18px; }
+.footer-meta a { color: #bff2f8; text-decoration: none; }
+/* Gradio's own footer (Use via API or MCP · Built with Gradio · Settings),
+   relocated into the banner by wireFooter() in the injected JS — bare white
+   hyperlinked text only: no pills, boxes, borders, or backgrounds. */
+#site-footer footer, #site-footer footer * { background: transparent !important;
+  border: 0 !important; box-shadow: none !important; border-radius: 0 !important; }
+#site-footer footer { justify-content: center; margin-top: 10px; padding: 0 !important; }
+#site-footer footer, #site-footer footer a, #site-footer footer button,
+#site-footer footer span { color: #fff !important; font-size: .86rem; }
+#site-footer footer a, #site-footer footer button { cursor: pointer;
+  text-decoration: none; padding: 0 !important; margin: 0 4px; min-width: 0 !important; }
+#site-footer footer a:hover, #site-footer footer button:hover {
+  color: #fff !important; text-decoration: underline; }
+/* ---- mobile ---- */
+@media (max-width: 760px) {
+  #site-nav .nav-inner { gap: 10px; padding: 10px 16px; }
+  .nav-links { gap: 10px; }
+  .nav-brand { font-size: 1rem; }
+  .nav-top { font-size: .86rem; padding: 6px 2px; }
+  .nav-menu { right: 0; left: auto; }  /* keep menus on-screen near the edge */
+  .lp-grid-3 { grid-template-columns: 1fr; }
+  #hero { padding: 46px 20px 40px; flex-direction: column; align-items: stretch; gap: 26px !important; }
+  #hero-left, #hero-right { flex: 1 1 auto; max-width: 100%; }
+  #hero-cta { justify-content: flex-start; }
+  .lp-section { padding-top: 56px; }
+}
+/* ===================================================================== */
+/* FAQ — left-aligned title + tabs + search + row-divider accordion       */
+/* ===================================================================== */
+.lp-faq-section { max-width: 1080px; margin: 0 auto; padding: 78px 22px 12px;
+  text-align: left; scroll-margin-top: 80px; }
+.lp-faq-head { display: flex; align-items: flex-end; justify-content: space-between;
+  gap: 24px; margin-bottom: 22px; flex-wrap: wrap; }
+.lp-faq-h { font-family: "Fraunces", serif; color: var(--text);
+  font-size: clamp(1.7rem, 3.4vw, 2.4rem); line-height: 1.1; letter-spacing: -.01em;
+  margin: 0; font-weight: 700; }
+/* Search input — bottom-border-only with svg icon on the right */
+.lp-faq-search { position: relative; display: flex; align-items: center;
+  min-width: 260px; border-bottom: 1px solid var(--line); padding: 6px 0;
+  transition: border-color .15s; }
+.lp-faq-search:focus-within { border-color: var(--violet); }
+.lp-faq-search input { flex: 1; border: 0; background: transparent; color: var(--text);
+  font-size: .98rem; font-family: inherit; outline: none; padding: 4px 28px 4px 0;
+  -webkit-appearance: none; appearance: none; }
+.lp-faq-search input::placeholder { color: var(--muted); }
+.lp-faq-search input::-webkit-search-cancel-button { -webkit-appearance: none; }
+.lp-faq-search svg { width: 18px; height: 18px; color: var(--muted);
+  position: absolute; right: 2px; pointer-events: none; }
+/* Tabs */
+.lp-faq-tabs { display: flex; gap: 32px; border-bottom: 1px solid var(--line);
+  margin: 0 0 4px; }
+.lp-faq-tab { background: transparent; border: 0; padding: 12px 0 14px;
+  cursor: pointer; font-family: inherit; font-size: 1rem; color: var(--muted);
+  font-weight: 600; position: relative; transition: color .15s; }
+.lp-faq-tab:hover { color: var(--text); }
+.lp-faq-tab.is-active { color: var(--text); }
+.lp-faq-tab.is-active::after { content: ""; position: absolute; left: 0; right: 0;
+  bottom: -1px; height: 2.5px; background: var(--violet); border-radius: 2px; }
+/* Lists + rows */
+.lp-faq-list { display: block; }
+.lp-faq-list.is-hidden { display: none; }
+.lp-faq-item { border-bottom: 1px solid var(--line); padding: 0; }
+.lp-faq-q { display: flex; justify-content: space-between; align-items: center;
+  gap: 16px; padding: 22px 0; cursor: pointer; list-style: none;
+  font-size: 1.05rem; color: var(--text); font-weight: 500; transition: color .15s; }
+.lp-faq-q::-webkit-details-marker { display: none; }
+.lp-faq-q:hover { color: var(--violet); }
+.lp-faq-qt { flex: 1; min-width: 0; }
+.lp-faq-ico { flex: 0 0 26px; display: inline-flex; color: var(--muted);
+  transition: color .15s; }
+.lp-faq-ico svg { width: 26px; height: 26px; }
+.lp-faq-q:hover .lp-faq-ico,
+.lp-faq-item[open] .lp-faq-ico { color: var(--violet); }
+.lp-faq-ico-v { transition: opacity .15s; }
+.lp-faq-item[open] .lp-faq-ico-v { opacity: 0; }
+.lp-faq-a { padding: 0 0 22px; color: var(--muted); line-height: 1.6;
+  font-size: .97rem; max-width: 760px; }
+.lp-faq-a p { margin: 0 0 10px; }
+.lp-faq-a p:last-child { margin-bottom: 0; }
+.lp-faq-a ul { margin: 6px 0 0; padding-left: 20px; }
+.lp-faq-a li { margin-bottom: 4px; }
+.lp-faq-a code { background: rgba(31,25,60,.06); padding: 1px 6px; border-radius: 5px;
+  font-size: .9em; }
+.lp-faq-a a { color: var(--violet); text-decoration: none; font-weight: 600; }
+.lp-faq-a a:hover { text-decoration: underline; }
+.lp-faq-a b { color: var(--text); }
+/* Empty-state message when search filters everything out */
+.lp-faq-empty { color: var(--muted); padding: 28px 0; text-align: center; }
+.lp-faq-empty.is-hidden { display: none; }
+@media (max-width: 720px) {
+  .lp-faq-section { padding-top: 56px; }
+  .lp-faq-head { align-items: flex-start; }
+  .lp-faq-search { min-width: 100%; }
+  .lp-faq-tabs { gap: 22px; }
+}
+/* ===================================================================== */
+/* Hackathon: Submission compliance scorecard                            */
+/* ===================================================================== */
+/* standalone nav link (between the dropdown groups and the CTA) */
+/* Home / Submission share the SAME typography as the Learn/Workspace tops */
+.nav-solo { background: none; border: 0; cursor: pointer; color: var(--muted) !important; font-weight: 600;
+  font-size: .92rem; padding: 6px 4px; display: inline-flex; align-items: center; gap: 6px;
+  font-family: inherit; text-decoration: none; transition: color .15s; }
+.nav-solo:hover { color: var(--text) !important; }
+/* submission scorecard */
+.sub-wrap { max-width: 920px; margin: 0 auto; padding: 10px 4px 28px; }
+.sub-group { margin: 20px 0; }
+.sub-h { font-size: .8rem; text-transform: uppercase; letter-spacing: .08em; color: var(--muted); margin: 0 0 10px; }
+.sub-lead { color: var(--text); line-height: 1.6; background: var(--surface); border: 1px solid var(--line);
+  border-left: 3px solid var(--violet); border-radius: 12px; padding: 14px 16px; box-shadow: 0 3px 12px rgba(45,32,90,.06); }
+.sub-lead code { background: var(--surface2); padding: 1px 6px; border-radius: 6px; }
+.sub-row { display: flex; gap: 12px; align-items: flex-start; background: var(--surface); border: 1px solid var(--line);
+  border-radius: 12px; padding: 12px 14px; margin-bottom: 8px; box-shadow: 0 3px 12px rgba(45,32,90,.06); }
+.sub-pill { flex: 0 0 auto; width: 26px; height: 26px; border-radius: 50%; display: flex; align-items: center;
+  justify-content: center; font-weight: 800; font-size: .9rem; }
+.sub-ok { background: rgba(21,137,79,.14); color: var(--mint); }
+.sub-warn { background: rgba(179,112,10,.16); color: var(--amber); }
+.sub-rt { min-width: 0; }
+.sub-title { font-weight: 700; color: var(--text); }
+.sub-ev { color: var(--muted); font-size: .92rem; margin-top: 2px; line-height: 1.5; }
+.sub-ev a { color: var(--cyan); }
+.sub-ev code, .sub-title code { background: var(--surface2); padding: 1px 6px; border-radius: 6px; }
+/* ===================================================================== */
+/* Reference-style redesign: nav pill · elevated tool card · 2-col input  */
+/* ===================================================================== */
+/* fine-tuned-model pill (top-right of the nav) — links to the model */
+.nav-status { display: inline-flex; align-items: center; gap: 6px; padding: 5px 12px; border-radius: 999px;
+  background: rgba(21,137,79,.12); color: var(--mint); font-size: .78rem; font-weight: 700;
+  white-space: nowrap; text-decoration: none; transition: filter .15s; }
+.nav-status:hover { filter: brightness(1.05); }
+.nav-status b { font-weight: 800; }
+/* elevated tool card that overlaps the hero (the agent, up top) */
+#tool-card { max-width: 1000px; margin: -84px auto 0 !important; position: relative; z-index: 5;
+  background: #fff !important; border: 1px solid var(--line) !important; border-radius: 20px !important;
+  box-shadow: 0 30px 70px rgba(20,12,50,.28) !important; padding: 26px !important; scroll-margin-top: 80px; }
+.tc-head { display: flex; justify-content: space-between; align-items: flex-end; gap: 16px; margin-bottom: 18px; flex-wrap: wrap; }
+.tc-eyebrow { color: var(--cyan); font-weight: 700; letter-spacing: .12em; text-transform: uppercase; font-size: .76rem; margin-bottom: 6px; }
+.tc-title { font-family: "Fraunces", serif; font-size: 1.7rem; color: var(--text); margin: 0; line-height: 1.1; }
+/* "Powered by fine-tuned Gemma 4" — green pill, right side of the tool-card head */
+.tc-poweredby { display: inline-flex; align-items: center; gap: 6px; padding: 7px 14px;
+  border-radius: 999px; background: rgba(21,137,79,.10); border: 1px solid rgba(21,137,79,.35);
+  color: var(--mint); font-size: .82rem; font-weight: 600; text-decoration: none;
+  transition: background .15s; }
+.tc-poweredby:hover { background: rgba(21,137,79,.18); }
+.tc-poweredby b { color: var(--mint); font-weight: 800; }
+/* ---- mode theme: the ONE decision point recolors the whole workflow card.
+   Offline = forest green ("local, sealed"); Online = cyan ("cloud-connected").
+   data-mode is set by wireModeTheme() on load + the mode.change JS. ---- */
+#tool-card { --mode-c: #15894f; --mode-soft: rgba(21,137,79,.07);
+  --mode-line: rgba(21,137,79,.35); }
+#tool-card[data-mode="online"] { --mode-c: #0e8ea0; --mode-soft: rgba(14,142,160,.09);
+  --mode-line: rgba(14,142,160,.4); }
+#tool-card { border-top: 4px solid var(--mode-c) !important;
+  background: linear-gradient(var(--mode-soft), transparent 170px), #fff !important;
+  transition: border-color .35s ease, background .35s ease; }
+/* ---- Offline / Online mode toggle, inside the full-width mode band ---- */
+#mode-band { background: var(--mode-soft) !important; border: 1px solid var(--mode-line) !important;
+  border-radius: 14px !important; padding: 12px 16px 10px !important; margin-bottom: 6px;
+  transition: background .35s ease, border-color .35s ease; }
+/* ONE box only: flatten every Gradio wrapper inside the band (the radio's and
+   the note's own block chrome would otherwise draw nested containers).
+   :not(label) — Gradio puts data-testid on the radio OPTION labels too, and
+   this rule must not strip their button/pill styling below. */
+#mode-band .block, #mode-band .form, #mode-band fieldset, #mode-band .gradio-html,
+#mode-band > div, #mode-band [data-testid]:not(label) {
+  background: transparent !important; border: 0 !important; box-shadow: none !important;
+  padding: 0 !important; margin: 0 !important; border-radius: 0 !important; }
+/* ONE enclosing pill around both options — the eye lands here; the pill's
+   border wears the active mode color. !important throughout: the band's
+   flatten rule zeroes block chrome and must not strip the pill itself. */
+#mode-toggle { display: flex !important; justify-content: center; gap: 0;
+  width: fit-content; margin: 2px auto 8px !important;
+  background: #fff !important; border: 2px solid var(--mode-c) !important;
+  border-radius: 999px !important; padding: 4px !important;
+  box-shadow: 0 4px 14px rgba(20,12,50,.10) !important;
+  transition: border-color .35s ease; }
+#mode-toggle .wrap, #mode-toggle > div { justify-content: center; gap: 6px; }
+/* each option is its OWN button: outlined + raised when idle (clearly
+   clickable), filled with the mode color when selected. Double-id selector
+   out-ranks the band's flatten rule. */
+#mode-band #mode-toggle label {
+  background: #fff !important; border: 1.5px solid var(--line) !important;
+  border-radius: 999px !important; padding: 8px 20px !important; cursor: pointer;
+  font-weight: 700 !important; color: var(--text) !important;
+  box-shadow: 0 1px 3px rgba(20,12,50,.14) !important;
+  margin: 0 2px !important; transition: background .25s, color .25s, border-color .25s,
+  transform .1s; }
+#mode-band #mode-toggle label:hover { border-color: var(--mode-c) !important; }
+#mode-band #mode-toggle label:active { transform: translateY(1px); }
+#mode-band #mode-toggle label:has(input:checked) {
+  background: var(--mode-c) !important; border-color: var(--mode-c) !important;
+  box-shadow: inset 0 1px 2px rgba(0,0,0,.15) !important; }
+#mode-band #mode-toggle label:has(input:checked) span { color: #fff !important; }
+#mode-toggle input[type="radio"] { display: none !important; }
+.mode-note { display: flex; align-items: center; justify-content: center; gap: 10px;
+  flex-wrap: wrap; text-align: center; color: var(--muted); font-size: .85rem; margin: 0 0 2px; }
+.mode-note code { background: var(--surface2); padding: 1px 5px; border-radius: 5px; }
+.mode-chip { display: inline-block; padding: 3px 10px; border-radius: 999px;
+  background: var(--mode-c); color: #fff; font-size: .68rem; font-weight: 800;
+  letter-spacing: .08em; transition: background .35s ease; }
+/* ---- numbered workflow steps, tied by a dashed tail under each chip ----
+   chips + connectors wear the active mode color (green offline / cyan online) */
+.flow-step { position: relative; display: flex; align-items: center; gap: 8px;
+  margin: 20px 0 8px; }
+.flow-step::before { content: ""; position: absolute; left: 10px; top: 100%;
+  height: 20px; border-left: 2px dashed rgba(109,75,224,.45); }
+#tool-card .flow-step::before { border-color: var(--mode-line); transition: border-color .35s ease; }
+#tool-card .step-chip { background: var(--mode-c); transition: background .35s ease; }
+.flow-t { font-weight: 700; color: var(--text); }
+.flow-sub { color: var(--muted); font-size: .82rem; font-weight: 500; }
+.flow-gcal { margin: -2px 0 10px 30px; }
+.flow-gcal .gcal-state { color: var(--mint); font-size: .85rem; }
+/* two-column ① upload / ② paste */
+#io-cols { gap: 20px !important; align-items: stretch; }
+.io-col { min-width: 0; }
+.io-label { display: flex; align-items: center; gap: 8px; font-weight: 700; color: var(--text); font-size: .95rem; margin-bottom: 8px; }
+.step-chip { display: inline-flex; align-items: center; justify-content: center; width: 22px; height: 22px;
+  border-radius: 50%; background: var(--accent); color: #fff; font-size: .8rem; font-weight: 800; }
+#io-drop { border: 2px dashed rgba(31,25,60,.22) !important; border-radius: 14px !important;
+  background: var(--surface2) !important; }
+/* char counter + helper line under the paste box */
+.rv-help { display: flex; justify-content: space-between; align-items: center; gap: 10px; margin-top: 6px;
+  color: var(--muted); font-size: .8rem; }
+.rv-counter { font-variant-numeric: tabular-nums; color: var(--muted); }
+/* centered primary / secondary actions + sample link */
+#rv-actions { justify-content: center !important; gap: 12px !important; margin-top: 16px; }
+#rv-analyze button { min-width: 200px; }
+.rv-secondary button { background: var(--surface2) !important; color: var(--text) !important;
+  border: 1px solid var(--line) !important; box-shadow: none !important; font-weight: 600 !important; }
+.rv-linkbtn { display: flex; justify-content: center; margin-top: 8px; }
+.rv-linkbtn button { background: none !important; border: none !important; box-shadow: none !important;
+  color: var(--cyan) !important; font-weight: 600 !important; min-height: 0 !important; }
+/* privacy-safe trace card */
+.trace-card { background: var(--surface2) !important; border: 1px solid var(--line) !important;
+  border-radius: 12px !important; padding: 12px 14px !important; margin-top: 14px; }
+.trace-desc { color: var(--muted); font-size: .82rem; margin-top: 2px; }
+.trace-ok { color: var(--mint); font-size: .85rem; font-weight: 600; margin-top: 6px; }
+/* screenshot-attached hint */
+.shot-status { color: var(--cyan); font-size: .82rem; font-weight: 600; margin-top: 6px; }
+/* mobile */
+@media (max-width: 760px) {
+  #io-cols { flex-direction: column; }
+  #tool-card { margin-top: -56px !important; padding: 18px !important; border-radius: 16px !important; }
+  .tc-head { align-items: flex-start; }
+  .hx-grid { grid-template-columns: 1fr; }
+  .nav-status { display: none; }
+}
+/* hero trust badges (under the copy, on the dark band) */
+.hero-badges { display: flex; flex-wrap: wrap; gap: 8px; margin-top: 18px; }
+.hbadge { display: inline-flex; align-items: center; gap: 6px; padding: 6px 12px; border-radius: 999px;
+  background: rgba(255,255,255,.10); border: 1px solid rgba(255,255,255,.18); color: #fff;
+  font-size: .8rem; font-weight: 700; }
+/* onboarding panel (inside the tool card) — an accordion: open on first visit,
+   collapsed (but always reopenable) once the device has memory. */
+#onboard { background: var(--surface2) !important; border: 1px solid var(--line) !important;
+  border-radius: 14px !important; padding: 12px 18px !important; margin-bottom: 16px; }
+#onboard .label-wrap span, #onboard > button span {
+  font-family: "Fraunces", serif !important; font-size: 1.1rem; font-weight: 700;
+  color: var(--text) !important; }
+.ob-sub { color: var(--muted); font-size: .9rem; margin: 4px 0 12px; }
+/* per-user Google Calendar connect link (in the export bar) */
+.gcal-connect { display: inline-block; margin-top: 6px; color: var(--cyan); font-weight: 600;
+  text-decoration: none; cursor: pointer; font-size: .9rem; }
+.gcal-connect:hover { text-decoration: underline; }
+.gcal-state { color: var(--mint); font-weight: 700; font-size: .85rem; margin-left: 4px; }
+/* ---- condensed results card: events + export in ONE area ---- */
+#rv-resultcard {
+  background: var(--surface) !important; border: 1px solid var(--line) !important;
+  border-radius: 16px !important; padding: 16px 18px !important;
+  box-shadow: var(--shadow); margin-top: 12px;
+}
+/* the export cluster becomes a toolbar inside the card — drop its own panel
+   chrome, separate it from the events with a hairline */
+#rv-resultcard #rv-export {
+  margin-top: 12px; padding: 12px 0 0 !important;
+  background: transparent !important; border: none !important;
+  border-top: 1px solid var(--line) !important; border-radius: 0 !important;
+  box-shadow: none;
+}
+#rv-resultcard #rv-export button { min-height: 42px !important; }
+/* keep the export toolbar reachable on a phone (re-assert the sticky bar at
+   the new, more specific selector) */
+@media (max-width: 640px) {
+  #rv-resultcard #rv-export {
+    position: sticky; bottom: 0; z-index: 20; padding: 10px !important;
+    background: rgba(255,255,255,.94) !important;
+    backdrop-filter: blur(8px); border-radius: 12px 12px 0 0 !important;
+    box-shadow: 0 -8px 24px rgba(45,32,90,.18);
+  }
+}
+/* prominent location line on the condensed event card */
+.evx-loc { font-size: .92rem; font-weight: 600; color: var(--text); margin-top: 8px; }
+/* arrival-context callout (per-event notes, e.g. "arrive 15 min early") */
+.evx-notes {
+  margin-top: 8px; padding: 6px 10px; font-size: .86rem; color: var(--text);
+  background: rgba(240,180,60,.12); border-left: 3px solid var(--amber);
+  border-radius: 0 8px 8px 0;
+}
+/* the pre-generated .ics widget stays as the no-JS download fallback — slim it */
+#ics-file { margin-top: 8px; }
+#ics-file, #ics-file .file-preview { max-height: 88px; overflow-y: auto; }
+/* ---- unified "Connect your calendar" block (Step 2a) ---- */
+.cal-connect { display: flex; flex-direction: column; gap: 8px; margin: 4px 0 10px;
+  padding: 10px 14px; background: var(--surface2); border-radius: 10px; }
+.cal-provider { display: flex; align-items: baseline; gap: 10px; flex-wrap: wrap;
+  font-size: .9rem; }
+.cal-prov-name { font-weight: 600; min-width: 160px; }
+.cal-cap { color: var(--muted); font-size: .82rem; }
+.cal-privacy { color: var(--muted); font-size: .78rem; margin-top: 2px; }
+.cal-provider .gcal-connect { margin-top: 0; }
+.cal-provider .gcal-disconnect { display: none; color: var(--muted);
+  font-size: .78rem; text-decoration: none; cursor: pointer; }
+.cal-provider .gcal-disconnect:hover { text-decoration: underline; }
+/* connected: collapse the CTA, show ✓ + a quiet disconnect */
+.cal-provider.is-connected .gcal-connect { display: none; }
+.cal-provider.is-connected .gcal-disconnect { display: inline; }
+.cal-provider.is-connected .cal-cap-online { display: none; }
+/* offline mode: Google sync needs the cloud — swap the CTA for a hint */
+.cal-provider[data-provider="google"] .cal-cap-offline { display: none; }
+#tool-card[data-mode="offline"] .cal-provider[data-provider="google"] .gcal-connect { display: none; }
+#tool-card[data-mode="offline"] .cal-provider[data-provider="google"] .cal-cap-online { display: none; }
+#tool-card[data-mode="offline"] .cal-provider[data-provider="google"]:not(.is-connected) .cal-cap-offline { display: inline; }
+/* ---- Google connection badge in the export toolbar (#rv-export) ---- */
+.gcal-badge-wrap { margin-top: 8px; }
+.gcal-badge { display: inline-block; font-size: .78rem; color: var(--muted);
+  padding: 2px 10px; border: 1px solid var(--line); border-radius: 999px; }
+.gcal-badge.is-on { color: var(--mint); border-color: currentColor; font-weight: 600; }
+/* the Offline workflow hides Google everywhere — badge included */
+#tool-card[data-mode="offline"] .gcal-badge-wrap { display: none; }
+/* ---- processing pipeline card (live agent stepper inside the trace accordion) ---- */
+.pipe-card { position: relative; background: var(--surface); border: 1px solid var(--line);
+  border-radius: 16px; padding: 14px 16px; margin-bottom: 10px; box-shadow: var(--shadow); }
+.pipe-head { display: flex; align-items: baseline; justify-content: space-between; gap: 10px; }
+.pipe-title { font-size: .9rem; font-weight: 700; color: var(--text); }
+.pipe-card[data-state="done"] .pipe-title { color: var(--mint); }
+.pipe-card[data-state="error"] .pipe-title { color: var(--coral); }
+/* the base rule forces Hanken Grotesk on everything — the clock must stay mono
+   so the ticking digits don't jitter */
+.pipe-clock { font-family: ui-monospace, Menlo, Consolas, monospace !important;
+  font-variant-numeric: tabular-nums; font-size: .85rem; color: var(--muted); }
+.pipe-track { position: relative; display: flex; align-items: center; gap: 4px;
+  margin-top: 12px; padding: 6px 2px; overflow: hidden; border-radius: 10px; }
+.pipe-stage { flex: 1 1 0; min-width: 0; display: flex; flex-direction: column;
+  align-items: center; gap: 4px; padding: 6px 2px; border-radius: 10px;
+  background: var(--surface2); transition: background .25s ease; }
+.pipe-badge { width: 24px; height: 24px; border-radius: 50%; display: flex;
+  align-items: center; justify-content: center; font-size: .72rem; font-weight: 800;
+  background: var(--surface); color: var(--muted); border: 1px solid var(--line);
+  transition: background .25s ease, border-color .25s ease, color .25s ease; }
+.pipe-lab { font-size: .66rem; font-weight: 700; letter-spacing: .08em; color: var(--muted); }
+.pipe-chev { flex: 0 0 auto; color: var(--muted); opacity: .5; font-weight: 700; }
+.pipe-stage.is-active { background: rgba(109,75,224,.10); }
+.pipe-stage.is-active .pipe-badge { background: #6d4be0; border-color: #6d4be0; color: #fff;
+  --c: #6d4be0; animation: pulse 1.4s infinite; }   /* reuses the Activity tab's pulse keyframes */
+.pipe-stage.is-active .pipe-lab { color: var(--text); }
+.pipe-stage.is-done { background: rgba(21,137,79,.08); }
+.pipe-stage.is-done .pipe-badge { background: var(--mint); border-color: var(--mint); color: #fff; }
+.pipe-stage.is-done .pipe-lab { color: var(--mint); }
+.pipe-stage.is-skip { opacity: .45; }
+.pipe-stage.is-skip .pipe-badge { background: transparent; border-style: dashed; }
+.pipe-stage.is-error .pipe-badge { background: var(--coral); border-color: var(--coral); color: #fff; }
+/* indeterminate light sweep across the whole track while running */
+.pipe-shimmer { position: absolute; inset: 0; pointer-events: none; display: none;
+  width: 36%; background: linear-gradient(100deg, transparent 0%,
+    rgba(109,75,224,.14) 45%, rgba(14,142,160,.14) 55%, transparent 100%); }
+.pipe-card[data-state="running"] .pipe-shimmer { display: block;
+  animation: pipe-sweep 2.5s ease-in-out infinite; }
+@keyframes pipe-sweep { from { transform: translateX(-100%); } to { transform: translateX(280%); } }
+.pipe-cap { margin-top: 8px; text-align: center; font-size: .8rem; color: var(--muted); }
+/* static post-run summary: speed / confidence / evidence / counts */
+.pipe-summary { display: flex; flex-wrap: wrap; gap: 8px; margin-top: 12px;
+  padding-top: 10px; border-top: 1px solid var(--line); }
+.pipe-chip { padding: 4px 10px; border-radius: 999px; background: var(--surface2);
+  border: 1px solid var(--line); font-size: .75rem; font-weight: 700; color: var(--text);
+  max-width: 260px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }
+.pipe-chip .pipe-k { color: var(--muted); font-weight: 600; margin-right: 4px;
+  text-transform: uppercase; font-size: .66rem; letter-spacing: .06em; }
+.pipe-chip b { font-family: ui-monospace, Menlo, Consolas, monospace !important; }
+.pipe-chip.is-high { color: var(--mint); }
+.pipe-chip.is-review { color: var(--amber); }
+@media (prefers-reduced-motion: reduce) {
+  .pipe-card[data-state="running"] .pipe-shimmer { display: none !important; }
+  .pipe-stage.is-active .pipe-badge { animation: none; }
+}
+/* ---- full-word pipeline stage labels: small, one line, equal-width boxes ---- */
+.pipe-lab { font-size: .58rem; letter-spacing: .03em; white-space: nowrap; }
+.pipe-stage { padding: 6px 1px; }
+/* ---- single export surface: the per-event "Add to" links own exporting; the
+   header gains an all-events iCal link when 2+ events are found ---- */
+.evx-head { display: flex; align-items: baseline; justify-content: space-between;
+  gap: 10px; flex-wrap: wrap; }
+.evx-add-all { font-size: .8rem; font-weight: 700; color: var(--cyan);
+  text-decoration: none; white-space: nowrap; }
+.evx-add-all:hover { text-decoration: underline; }
+/* ---- consistent page width: Agent/Activity/Memory/Feed match the home
+   page's 1000px tool card (capping + centering only — internal layout of
+   the pages is untouched) ---- */
+.page-wrap { max-width: 1000px; margin-left: auto !important; margin-right: auto !important; width: 100%; }
+/* ---- Activity tab: metric values in the brand violet (the This-week cards
+   and the run tiles both render through .tile-v). Resets the gradient
+   text-clip so the color actually shows regardless of the base rule's state. ---- */
+.tile-v { color: var(--violet); background: none;
+  -webkit-background-clip: initial; background-clip: initial;
+  -webkit-text-fill-color: currentColor; }
+/* ---- mobile responsiveness fixes (phone) ---- */
+/* Top nav: on a phone the brand wordmark + 5 links overflow a single row
+   (logo overlaps "Home", "Memory" clipped, "Feed" off-screen). Stack the brand
+   above a horizontally-scrollable link strip so nothing overlaps and every
+   page stays reachable. Reuses the classes from _nav_html(). */
+@media (max-width: 600px) {
+  #site-nav .nav-inner { flex-direction: column; align-items: stretch;
+    justify-content: flex-start; gap: 6px; padding: 8px 14px; }
+  .nav-brand { justify-content: flex-start; font-size: .98rem; }
+  /* WRAP, don't scroll: a horizontal overflow-x:auto strip swallowed taps on a
+     phone (a touch with any micro-movement reads as a scroll, and links past the
+     fold needed scrolling first) — which left Submission/Memory/Feed feeling
+     dead. Wrapping keeps every link fully on-screen and plainly tappable. */
+  .nav-links { width: 100%; gap: 10px 16px; flex-wrap: wrap; }
+  .nav-solo { flex: 0 0 auto; white-space: nowrap; padding: 8px 6px;
+    touch-action: manipulation; }
+}
+/* Activity "Activity by stage" chart shares a 2-col row with the timeline, so on
+   a phone it's stuck at half width and the x-axis stage labels collapse into
+   garbled overlapping text. Stack the row so the chart renders full-width. */
+@media (max-width: 760px) {
+  .act-chart-row { flex-direction: column !important; }
+  .act-chart-row > * { width: 100% !important; }
+}
+/* Footer: keep "on Hugging Face" (its own <a> in .footer-meta) on one line
+   instead of breaking as "on Hugging" / "Face". */
+.footer-meta a { white-space: nowrap; }
+/* Hardware-degraded banner — revealed by app.py's /health probe (inline JS). */
+#status-banner-host { padding: 0; margin: 0; }
+.status-banner {
+  display: none;
+  padding: 10px 16px;
+  text-align: center;
+  font-weight: 600;
+  font-size: 0.95rem;
+  line-height: 1.4;
+  background: var(--accent);
+  color: #fff;
+}

static/logo.png ADDED Viewed