OffGridSchedula

Sleeping

App Files Files Community

OffGridSchedula / docs /on-device.md

ParetoOptimal

Initial Commit

0366d65 18 days ago

preview code

Raw

History Blame Contribute Delete

2.28 kB

	# Running on a cell phone (on-device or thin-client)

	"Runs on a cell phone" can mean two things; the app supports both via one env switch.

	## The inference switch

	`server/model.py` reads `INFERENCE_BASE_URL`:

	- Unset (default): the GGUF is loaded in-process via `llama-cpp-python` (the Space / a laptop).
	- Set: generation is delegated to a remote OpenAI-compatible / llama.cpp server at that URL.
	Same agent code, different inference location.

	```bash
	export INFERENCE_BASE_URL="http://127.0.0.1:8080/v1" # a llama-server on the phone
	export INFERENCE_API_KEY="..." # optional
	export INFERENCE_MODEL="gemma-e4b" # optional label
	```

	So "on the phone" = run a `llama-server` on the device and point the agent at `127.0.0.1`.

	## On-device model profile (Gemma E4B edge)

	A 31B Q4 GGUF (~18–20 GB) needs a GPU and will not run on a phone. Use the lightweight Gemma E4B
	edge variant (see [PLAN.md](../PLAN.md) and the README Accuracy upgrade section), with a small
	context window:

	```bash
	export MODEL_REPO="<your-or-community gemma E4B GGUF repo>"
	export MODEL_FILE="<gemma-e4b-*-Q4_K_M.gguf>"
	export N_CTX=4096 # keep the KV cache small on a phone
	export N_GPU_LAYERS=0 # CPU; on a Mac use Metal layers instead
	```

	## Android (Termux) — genuinely on-device

	```bash
	pkg install python git cmake clang
	git clone <this repo> && cd imessage-calendar-agent
	pip install -r requirements-ci.txt llama-cpp-python # CPU build
	# Option 1: run the whole app (UI + /agent) on the phone
	USE_STUB_EXTRACTOR=0 python app.py # http://127.0.0.1:7860
	# Option 2: run only a llama-server and point a client/app at it
	# llama-server -m <gemma-e4b.gguf> --port 8080
	# then set INFERENCE_BASE_URL=http://127.0.0.1:8080/v1
	```

	Expect multi-second latency per request on phone CPU — keep `N_CTX` small and threads short.

	## iOS — the honest limit

	iOS does not allow background message access or a persistent background LLM server. You cannot
	run an autonomous on-device agent for iMessage on an iPhone. The supported iOS path is the
	foreground Shortcut in [automations.md](./automations.md), optionally pointing at a remote
	`INFERENCE_BASE_URL` for the model.