OffGridSchedula

Sleeping

App Files Files Community

OffGridSchedula / docs /architecture.md

ParetoOptimal

Initial Commit

0366d65 18 days ago

preview code

Raw

History Blame Contribute Delete

7.74 kB

	# Architecture — workflows and the LLMs behind them

	An AI-solution-architect view of the agentic system: every workflow through the
	platform, and exactly which model (if any) each one calls. The architectural
	signature: the extraction core is one grammar-constrained LLM call, the
	MiniCPM planner adds a visible multi-step loop over the platform's own
	public MCP tool contract, everything verifiable — conflict math, dedup, time
	proposals, eval gates — stays deterministic, and there are **zero cloud-AI API
	calls anywhere**, training included.

	## System workflow

	```mermaid
	flowchart TB
	subgraph ENTRY["1 · Entry points — four front-ends, one contract"]
	direction LR
	UIIN["🖥️ Gradio UI<br/>Schedule flow + Agent tab<br/>(paste thread, screenshots, .ics)"]
	SHORT["📱 iOS Shortcut /<br/>Android Tasker"]
	MAC["🍎 Mac collector<br/>polls iMessage chat.db<br/>(collector/collector.py)"]
	MCPC["🤖 MCP clients<br/>Claude Desktop, Cursor"]
	end

	subgraph API["2 · API & orchestration — app.py (FastAPI + Gradio, one port)"]
	AGENTEP["POST /agent<br/>bearer-token, stateless"]
	INGEST["POST /ingest → feed store<br/>AUTONOMOUS=1 triggers on<br/>your outgoing message (is_from_me)"]
	ROLL["threads.rolling_thread<br/>per-chat window (20 msgs / 12 h)"]
	MCPT["MCP tools — server/mcp_tools.py<br/>extract_events · make_ics · check_conflicts"]
	end

	subgraph ORCH["2a · Agentic orchestration — server/orchestrator.py"]
	SMOL["smolagents ToolCallingAgent<br/>planned by MiniCPM, ≤6 steps<br/>playbook: extract → check → render<br/>final ActionPlan re-derived deterministically"]
	SCRIPT["ScriptedPlanner — no LLM<br/>identical tool sequence + step events<br/>(stub mode, CI, planner failure)"]
	end

	subgraph CORE["3 · Agent core — server/pipeline.py → server/agent.py"]
	PROMPT["Prompt assembly:<br/>SYSTEM + memory recall block<br/>+ existing calendar + thread + images"]
	GEN["Grammar-constrained generation<br/>→ ActionPlan JSON (always parses)"]
	PROMPT --> GEN
	end

	subgraph LLMT["4 · LLM tier — ALL inference is local llama.cpp, zero cloud AI APIs"]
	GEMMA["⭐ gemma-cal E4B — fine-tuned Gemma 4<br/>ParetoOptimal/gemma-4-cal-gguf<br/>gemma-cal-e4b-Q4_K_M.gguf (~5 GB)<br/>+ mmproj-F16.gguf vision projector"]
	MODES["served either:<br/>· in-process llama-cpp-python (ZeroGPU lease)<br/>· remote llama-server via INFERENCE_BASE_URL<br/>(Space sidecar / Mac launchd / phone)"]
	MINICPM["🧭 MiniCPM planner — OpenBMB (sponsor)<br/>openbmb/MiniCPM4.1-8B-GGUF Q4 (~5 GB)<br/>≤4B option: openbmb/MiniCPM5-1B-GGUF (config switch)<br/>2nd llama-server :8081 — enabled via<br/>PLANNER_HF_REPO / PLANNER_FILE"]
	HERMES["(optional) Hermes-3-Llama-3.1-8B Q4_K_M<br/>HERMES_TOOLS=1 — tool-calling loop:<br/>calls remember() to write memory mid-run"]
	STUB["(no LLM) regex stub extractor<br/>USE_STUB_EXTRACTOR=1 — CI & free tier"]
	GEMMA --- MODES
	end

	subgraph DET["5 · Deterministic post-processing — no LLM"]
	CONF["freebusy.annotate_conflicts<br/>overlap / adjacent / tight<br/>+ propose_times free slots"]
	DEDUP["dedup.filter_new<br/>idempotency for autonomous runs"]
	MEMW["memory.observe_plan<br/>learns recurring contacts"]
	end

	subgraph OUT["6 · Outputs"]
	CARDS["Event cards + reply draft<br/>+ clarification question"]
	ICS["📥 .ics download<br/>(off-grid default)"]
	GCAL["📆 Google Calendar push<br/>(per-user OAuth web flow, opt-in)"]
	TRACE["Redacted trace export<br/>→ public HF dataset"]
	end

	UIIN -->\|"run_orchestrator (step trace streams into the UI)"\| SMOL
	SHORT --> AGENTEP
	MAC -->\|"store-only"\| INGEST
	MAC -->\|"AGENT_MODE=1"\| AGENTEP
	MCPC --> MCPT
	AGENTEP --> CORE
	INGEST --> ROLL --> CORE
	SMOL ==>\|"planning loop, ≤6 steps"\| MINICPM
	SMOL -->\|"tool calls — the Space's OWN MCP<br/>endpoint (localhost SSE)"\| MCPT
	SMOL -.->\|"planner down / stub mode"\| SCRIPT
	SCRIPT -->\|"same tool sequence,<br/>deterministic"\| MCPT
	MCPT -->\|"extract_events → 1 LLM call"\| CORE
	MCPT -.->\|"make_ics / check_conflicts → 0 LLM calls"\| DET

	GEN ==>\|"default"\| GEMMA
	GEN -.->\|"opt-in autonomous brain"\| HERMES
	GEN -.->\|"tests / free demo"\| STUB
	HERMES -->\|"remember()"\| MEMW

	LLMT --> DET --> OUT
	```

	## Offline loop — eval-gated fine-tuning (produces the serving LLM)

	```mermaid
	flowchart LR
	SEEDS["Seed data — NO LLM<br/>139 hand-authored template examples<br/>(gen_new_seeds.py / make_dataset.py)"]
	SMC["SMCalFlow import — NO LLM<br/>deterministic LISP-program parse, ~2000 rows"]
	TRAIN["QLoRA fine-tune — Unsloth on Modal A100-80GB<br/>base: google/gemma-4-31B-it or gemma-4-E4B-it<br/>r=16, lr 5e-5, 2 epochs, responses-only loss"]
	GGUF["convert_hf_to_gguf + llama-quantize<br/>→ staging Q4_K_M GGUF"]
	EVAL["Eval — NO LLM judge, deterministic metrics<br/>60-example held-out set:<br/>schema validity · event F1 · start-exact recall"]
	GATE{"Gate<br/>validity ≥ 0.95<br/>F1 ≥ 0.81<br/>recall ≥ 0.773"}
	PROD["Promote → ParetoOptimal/gemma-4-cal-gguf<br/>(the model the Space serves)"]
	TRASH["Discard staging —<br/>production untouched"]

	SEEDS --> TRAIN
	SMC --> TRAIN
	TRAIN --> GGUF --> EVAL --> GATE
	GATE -->\|pass\| PROD
	GATE -->\|fail\| TRASH
	```

	See [eval-roadmap.md](./eval-roadmap.md) and the
	[eval-gated fine-tuning post-mortem](./blog-eval-gated-finetuning.md) for the
	gate's history and rationale; [hermes.md](./hermes.md) for the optional
	tool-calling backend; [build-small-submission.md](./build-small-submission.md)
	for how the MiniCPM planner maps to the `sponsor:openbmb` track.

	## Which LLM each workflow calls

	\| # \| Workflow \| Trigger \| LLM call(s) \| Where it runs \|
	\|---\|----------\|---------\|-------------\|----------------\|
	\| 1 \| Agentic orchestration (Schedule flow + Agent tab) \| User pastes thread / uploads screenshots, clicks Find the events / Run the agents \| 1× MiniCPM planning loop (`MiniCPM4.1-8B`, or `MiniCPM5-1B` ≤4B variant; ≤6 steps) driving the Space's own MCP tools, + 1× gemma-cal E4B per `extract_events` tool call (vision via mmproj); `check_conflicts`/`make_ics` are zero-LLM. Planner unconfigured or down → ScriptedPlanner runs the identical sequence, gemma-cal only \| Two local llama-servers — gemma-cal on :8080, MiniCPM on :8081 \|
	\| 2 \| API extraction (`POST /agent`) \| iOS Shortcut, Android Tasker, or Mac collector in `AGENT_MODE=1` \| 1× gemma-cal E4B (same pipeline, same prompt) \| Same \|
	\| 3 \| Autonomous ingest \| Mac collector → `/ingest`; your outgoing message triggers a run over the chat's rolling thread \| 1× gemma-cal E4B per affected chat, then deterministic dedup + calendar delivery \| Same \|
	\| 4 \| Memory-writing agent (optional) \| `HERMES_TOOLS=1` on the remote path \| Hermes-3-Llama-3.1-8B in a tool loop (≤3 rounds): may call `remember()` then returns the ActionPlan \| Remote llama-server (e.g. Mac launchd) \|
	\| 5 \| MCP tools for external agents \| MCP client calls the Space \| `extract_events` → 1× gemma-cal E4B; `make_ics` and `check_conflicts` → zero LLM calls \| Same as #1 \|
	\| 6 \| CI / free-tier demo \| `USE_STUB_EXTRACTOR=1` \| No LLM — regex heuristic \| CPU anywhere \|
	\| 7 \| Training & eval (offline) \| `training/gated_retrain.py` \| No LLM at the inference-API level: data gen is template-based, eval is metric-based (no judge). The LLM here is the training target: QLoRA on `google/gemma-4-31B-it` / `gemma-4-E4B-it` \| Modal A100/H100 \|