OffGridSchedula / docs /architecture.md
ParetoOptimal's picture
Initial Commit
0366d65
|
Raw
History Blame Contribute Delete
7.74 kB

Architecture β€” workflows and the LLMs behind them

An AI-solution-architect view of the agentic system: every workflow through the platform, and exactly which model (if any) each one calls. The architectural signature: the extraction core is one grammar-constrained LLM call, the MiniCPM planner adds a visible multi-step loop over the platform's own public MCP tool contract, everything verifiable β€” conflict math, dedup, time proposals, eval gates β€” stays deterministic, and there are zero cloud-AI API calls anywhere, training included.

System workflow

flowchart TB
    subgraph ENTRY["1 Β· Entry points β€” four front-ends, one contract"]
        direction LR
        UIIN["πŸ–₯️ Gradio UI<br/>Schedule flow + Agent tab<br/>(paste thread, screenshots, .ics)"]
        SHORT["πŸ“± iOS Shortcut /<br/>Android Tasker"]
        MAC["🍎 Mac collector<br/>polls iMessage chat.db<br/>(collector/collector.py)"]
        MCPC["πŸ€– MCP clients<br/>Claude Desktop, Cursor"]
    end

    subgraph API["2 Β· API & orchestration β€” app.py (FastAPI + Gradio, one port)"]
        AGENTEP["POST /agent<br/>bearer-token, stateless"]
        INGEST["POST /ingest β†’ feed store<br/>AUTONOMOUS=1 triggers on<br/>your outgoing message (is_from_me)"]
        ROLL["threads.rolling_thread<br/>per-chat window (20 msgs / 12 h)"]
        MCPT["MCP tools β€” server/mcp_tools.py<br/>extract_events Β· make_ics Β· check_conflicts"]
    end

    subgraph ORCH["2a Β· Agentic orchestration β€” server/orchestrator.py"]
        SMOL["smolagents ToolCallingAgent<br/>planned by MiniCPM, ≀6 steps<br/>playbook: extract β†’ check β†’ render<br/>final ActionPlan re-derived deterministically"]
        SCRIPT["ScriptedPlanner β€” no LLM<br/>identical tool sequence + step events<br/>(stub mode, CI, planner failure)"]
    end

    subgraph CORE["3 Β· Agent core β€” server/pipeline.py β†’ server/agent.py"]
        PROMPT["Prompt assembly:<br/>SYSTEM + memory recall block<br/>+ existing calendar + thread + images"]
        GEN["Grammar-constrained generation<br/>β†’ ActionPlan JSON (always parses)"]
        PROMPT --> GEN
    end

    subgraph LLMT["4 Β· LLM tier β€” ALL inference is local llama.cpp, zero cloud AI APIs"]
        GEMMA["⭐ gemma-cal E4B β€” fine-tuned Gemma 4<br/>ParetoOptimal/gemma-4-cal-gguf<br/>gemma-cal-e4b-Q4_K_M.gguf (~5 GB)<br/>+ mmproj-F16.gguf vision projector"]
        MODES["served either:<br/>Β· in-process llama-cpp-python (ZeroGPU lease)<br/>Β· remote llama-server via INFERENCE_BASE_URL<br/>(Space sidecar / Mac launchd / phone)"]
        MINICPM["🧭 MiniCPM planner β€” OpenBMB (sponsor)<br/>openbmb/MiniCPM4.1-8B-GGUF Q4 (~5 GB)<br/>≀4B option: openbmb/MiniCPM5-1B-GGUF (config switch)<br/>2nd llama-server :8081 β€” enabled via<br/>PLANNER_HF_REPO / PLANNER_FILE"]
        HERMES["(optional) Hermes-3-Llama-3.1-8B Q4_K_M<br/>HERMES_TOOLS=1 β€” tool-calling loop:<br/>calls remember() to write memory mid-run"]
        STUB["(no LLM) regex stub extractor<br/>USE_STUB_EXTRACTOR=1 β€” CI & free tier"]
        GEMMA --- MODES
    end

    subgraph DET["5 Β· Deterministic post-processing β€” no LLM"]
        CONF["freebusy.annotate_conflicts<br/>overlap / adjacent / tight<br/>+ propose_times free slots"]
        DEDUP["dedup.filter_new<br/>idempotency for autonomous runs"]
        MEMW["memory.observe_plan<br/>learns recurring contacts"]
    end

    subgraph OUT["6 Β· Outputs"]
        CARDS["Event cards + reply draft<br/>+ clarification question"]
        ICS["πŸ“₯ .ics download<br/>(off-grid default)"]
        GCAL["πŸ“† Google Calendar push<br/>(per-user OAuth web flow, opt-in)"]
        TRACE["Redacted trace export<br/>β†’ public HF dataset"]
    end

    UIIN -->|"run_orchestrator (step trace streams into the UI)"| SMOL
    SHORT --> AGENTEP
    MAC -->|"store-only"| INGEST
    MAC -->|"AGENT_MODE=1"| AGENTEP
    MCPC --> MCPT
    AGENTEP --> CORE
    INGEST --> ROLL --> CORE
    SMOL ==>|"planning loop, ≀6 steps"| MINICPM
    SMOL -->|"tool calls β€” the Space's OWN MCP<br/>endpoint (localhost SSE)"| MCPT
    SMOL -.->|"planner down / stub mode"| SCRIPT
    SCRIPT -->|"same tool sequence,<br/>deterministic"| MCPT
    MCPT -->|"extract_events β†’ 1 LLM call"| CORE
    MCPT -.->|"make_ics / check_conflicts β†’ 0 LLM calls"| DET

    GEN ==>|"default"| GEMMA
    GEN -.->|"opt-in autonomous brain"| HERMES
    GEN -.->|"tests / free demo"| STUB
    HERMES -->|"remember()"| MEMW

    LLMT --> DET --> OUT

Offline loop β€” eval-gated fine-tuning (produces the serving LLM)

flowchart LR
    SEEDS["Seed data β€” NO LLM<br/>139 hand-authored template examples<br/>(gen_new_seeds.py / make_dataset.py)"]
    SMC["SMCalFlow import β€” NO LLM<br/>deterministic LISP-program parse, ~2000 rows"]
    TRAIN["QLoRA fine-tune β€” Unsloth on Modal A100-80GB<br/>base: google/gemma-4-31B-it or gemma-4-E4B-it<br/>r=16, lr 5e-5, 2 epochs, responses-only loss"]
    GGUF["convert_hf_to_gguf + llama-quantize<br/>β†’ staging Q4_K_M GGUF"]
    EVAL["Eval β€” NO LLM judge, deterministic metrics<br/>60-example held-out set:<br/>schema validity Β· event F1 Β· start-exact recall"]
    GATE{"Gate<br/>validity β‰₯ 0.95<br/>F1 β‰₯ 0.81<br/>recall β‰₯ 0.773"}
    PROD["Promote β†’ ParetoOptimal/gemma-4-cal-gguf<br/>(the model the Space serves)"]
    TRASH["Discard staging β€”<br/>production untouched"]

    SEEDS --> TRAIN
    SMC --> TRAIN
    TRAIN --> GGUF --> EVAL --> GATE
    GATE -->|pass| PROD
    GATE -->|fail| TRASH

See eval-roadmap.md and the eval-gated fine-tuning post-mortem for the gate's history and rationale; hermes.md for the optional tool-calling backend; build-small-submission.md for how the MiniCPM planner maps to the sponsor:openbmb track.

Which LLM each workflow calls

# Workflow Trigger LLM call(s) Where it runs
1 Agentic orchestration (Schedule flow + Agent tab) User pastes thread / uploads screenshots, clicks Find the events / Run the agents 1Γ— MiniCPM planning loop (MiniCPM4.1-8B, or MiniCPM5-1B ≀4B variant; ≀6 steps) driving the Space's own MCP tools, + 1Γ— gemma-cal E4B per extract_events tool call (vision via mmproj); check_conflicts/make_ics are zero-LLM. Planner unconfigured or down β†’ ScriptedPlanner runs the identical sequence, gemma-cal only Two local llama-servers β€” gemma-cal on :8080, MiniCPM on :8081
2 API extraction (POST /agent) iOS Shortcut, Android Tasker, or Mac collector in AGENT_MODE=1 1Γ— gemma-cal E4B (same pipeline, same prompt) Same
3 Autonomous ingest Mac collector β†’ /ingest; your outgoing message triggers a run over the chat's rolling thread 1Γ— gemma-cal E4B per affected chat, then deterministic dedup + calendar delivery Same
4 Memory-writing agent (optional) HERMES_TOOLS=1 on the remote path Hermes-3-Llama-3.1-8B in a tool loop (≀3 rounds): may call remember() then returns the ActionPlan Remote llama-server (e.g. Mac launchd)
5 MCP tools for external agents MCP client calls the Space extract_events β†’ 1Γ— gemma-cal E4B; make_ics and check_conflicts β†’ zero LLM calls Same as #1
6 CI / free-tier demo USE_STUB_EXTRACTOR=1 No LLM β€” regex heuristic CPU anywhere
7 Training & eval (offline) training/gated_retrain.py No LLM at the inference-API level: data gen is template-based, eval is metric-based (no judge). The LLM here is the training target: QLoRA on google/gemma-4-31B-it / gemma-4-E4B-it Modal A100/H100