OffGridSchedula / docs /hermes.md
ParetoOptimal's picture
Initial Commit
0366d65
|
Raw
History Blame Contribute Delete
2.49 kB

The Hermes "grows-with-you" brain

The agent's reasoning is pluggable through INFERENCE_BASE_URL (see server/model.py). Point it at a NousResearch Hermes model served OpenAI-compatible and the whole pipeline uses it — no code change. Hermes is a tool-calling Llama/Qwen fine-tune, a good fit for the autonomous daemon.

Serve Hermes locally (llama.cpp → "Llama Champion")

# Hermes 3 Llama 3.1 8B (Q4_K_M) runs comfortably on a Mac with Metal.
llama-server -m ~/models/Hermes-3-Llama-3.1-8B.Q4_K_M.gguf \
  --host 127.0.0.1 --port 8080 --ctx-size 8192 --jinja   # --jinja = tool-calling template

Point the backend at it:

export INFERENCE_BASE_URL="http://127.0.0.1:8080/v1"
export INFERENCE_MODEL="hermes"
export USE_STUB_EXTRACTOR=0
python app.py

server/model.py routes complete_json / stream_complete_json to the remote server when INFERENCE_BASE_URL is set (_remote_*), still grammar-constraining the output to the ActionPlan schema. (Ollama or vLLM also work — any OpenAI-compatible endpoint.)

"Grows with you" — the memory (server/memory.py)

Durable facts/preferences personalize every extraction; they're injected into the prompt via recall() (server/agent.py::build_messages) and shown/edited in the dashboard Memory tab.

  • Learns automatically: recurring event attendees become contact facts (observe_plan).
  • You can teach it: add facts in the Memory tab — "Dana is the soccer coach", "you decline Mondays", "default location is Lincoln Elementary".
  • Hermes can update it itself: set HERMES_TOOLS=1 and the remote path advertises a remember tool (server/tools.py); the model calls it mid-run to save durable facts, then returns the ActionPlan. The tool-call loop is in server/model.py::_remote_complete_json (round-trip logic + tests in server/tools.py / tests/test_tools.py). Requires a tool-calling server (--jinja).
  • Stored at MEMORY_PATH (set it to a real path like ~/.offgrid/agent_memory.json, not /tmp).

Over time the model resolves nicknames, applies your preferences to conflicts, and needs fewer clarifications — the "grows with you" behavior.

Where it runs

The Hermes brain + memory live wherever the autonomous backend runs — the Mac daemon (scripts/setup_mac.sh), an Android phone via Termux (INFERENCE_BASE_URL → a local llama-server), or a cloud box (if you capture from email/Slack/Telegram instead of iMessage).