OffGridSchedula / docs /on-device.md
ParetoOptimal's picture
Initial Commit
0366d65
|
Raw
History Blame Contribute Delete
2.28 kB

Running on a cell phone (on-device or thin-client)

"Runs on a cell phone" can mean two things; the app supports both via one env switch.

The inference switch

server/model.py reads INFERENCE_BASE_URL:

  • Unset (default): the GGUF is loaded in-process via llama-cpp-python (the Space / a laptop).
  • Set: generation is delegated to a remote OpenAI-compatible / llama.cpp server at that URL. Same agent code, different inference location.
export INFERENCE_BASE_URL="http://127.0.0.1:8080/v1"   # a llama-server on the phone
export INFERENCE_API_KEY="..."                          # optional
export INFERENCE_MODEL="gemma-e4b"                       # optional label

So "on the phone" = run a llama-server on the device and point the agent at 127.0.0.1.

On-device model profile (Gemma E4B edge)

A 31B Q4 GGUF (~18–20 GB) needs a GPU and will not run on a phone. Use the lightweight Gemma E4B edge variant (see PLAN.md and the README Accuracy upgrade section), with a small context window:

export MODEL_REPO="<your-or-community gemma E4B GGUF repo>"
export MODEL_FILE="<gemma-e4b-*-Q4_K_M.gguf>"
export N_CTX=4096           # keep the KV cache small on a phone
export N_GPU_LAYERS=0       # CPU; on a Mac use Metal layers instead

Android (Termux) — genuinely on-device

pkg install python git cmake clang
git clone <this repo> && cd imessage-calendar-agent
pip install -r requirements-ci.txt llama-cpp-python   # CPU build
# Option 1: run the whole app (UI + /agent) on the phone
USE_STUB_EXTRACTOR=0 python app.py                    # http://127.0.0.1:7860
# Option 2: run only a llama-server and point a client/app at it
#   llama-server -m <gemma-e4b.gguf> --port 8080
#   then set INFERENCE_BASE_URL=http://127.0.0.1:8080/v1

Expect multi-second latency per request on phone CPU — keep N_CTX small and threads short.

iOS — the honest limit

iOS does not allow background message access or a persistent background LLM server. You cannot run an autonomous on-device agent for iMessage on an iPhone. The supported iOS path is the foreground Shortcut in automations.md, optionally pointing at a remote INFERENCE_BASE_URL for the model.