Spaces:

amos-fernandes
/

melbot

Configuration error

App Files Files Community

melbot / docs /concepts /memory.md

amos-fernandes

Upload 4501 files

3a65265 verified about 1 month ago

preview code

raw

history blame contribute delete

14.4 kB

	---
	summary: "How Moltbot memory works (workspace files + automatic memory flush)"
	read_when:
	- You want the memory file layout and workflow
	- You want to tune the automatic pre-compaction memory flush
	---
	# Memory

	Moltbot memory is plain Markdown in the agent workspace. The files are the
	source of truth; the model only "remembers" what gets written to disk.

	Memory search tools are provided by the active memory plugin (default:
	`memory-core`). Disable memory plugins with `plugins.slots.memory = "none"`.

	## Memory files (Markdown)

	The default workspace layout uses two memory layers:

	- `memory/YYYY-MM-DD.md`
	- Daily log (append-only).
	- Read today + yesterday at session start.
	- `MEMORY.md` (optional)
	- Curated long-term memory.
	- Only load in the main, private session (never in group contexts).

	These files live under the workspace (`agents.defaults.workspace`, default
	`~/clawd`). See [Agent workspace](/concepts/agent-workspace) for the full layout.

	## When to write memory

	- Decisions, preferences, and durable facts go to `MEMORY.md`.
	- Day-to-day notes and running context go to `memory/YYYY-MM-DD.md`.
	- If someone says "remember this," write it down (do not keep it in RAM).
	- This area is still evolving. It helps to remind the model to store memories; it will know what to do.
	- If you want something to stick, ask the bot to write it into memory.

	## Automatic memory flush (pre-compaction ping)

	When a session is close to auto-compaction, Moltbot triggers a **silent,
	agentic turn that reminds the model to write durable memory before** the
	context is compacted. The default prompts explicitly say the model may reply,
	but usually `NO_REPLY` is the correct response so the user never sees this turn.

	This is controlled by `agents.defaults.compaction.memoryFlush`:

	```json5
	{
	agents: {
	defaults: {
	compaction: {
	reserveTokensFloor: 20000,
	memoryFlush: {
	enabled: true,
	softThresholdTokens: 4000,
	systemPrompt: "Session nearing compaction. Store durable memories now.",
	prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
	}
	}
	}
	}
	}
	```

	Details:
	- Soft threshold: flush triggers when the session token estimate crosses
	`contextWindow - reserveTokensFloor - softThresholdTokens`.
	- Silent by default: prompts include `NO_REPLY` so nothing is delivered.
	- Two prompts: a user prompt plus a system prompt append the reminder.
	- One flush per compaction cycle (tracked in `sessions.json`).
	- Workspace must be writable: if the session runs sandboxed with
	`workspaceAccess: "ro"` or `"none"`, the flush is skipped.

	For the full compaction lifecycle, see
	[Session management + compaction](/reference/session-management-compaction).

	## Vector memory search

	Moltbot can build a small vector index over `MEMORY.md` and `memory/*.md` so
	semantic queries can find related notes even when wording differs.

	Defaults:
	- Enabled by default.
	- Watches memory files for changes (debounced).
	- Uses remote embeddings by default. If `memorySearch.provider` is not set, Moltbot auto-selects:
	1. `local` if a `memorySearch.local.modelPath` is configured and the file exists.
	2. `openai` if an OpenAI key can be resolved.
	3. `gemini` if a Gemini key can be resolved.
	4. Otherwise memory search stays disabled until configured.
	- Local mode uses node-llama-cpp and may require `pnpm approve-builds`.
	- Uses sqlite-vec (when available) to accelerate vector search inside SQLite.

	Remote embeddings require an API key for the embedding provider. Moltbot
	resolves keys from auth profiles, `models.providers.*.apiKey`, or environment
	variables. Codex OAuth only covers chat/completions and does not satisfy
	embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or
	`models.providers.google.apiKey`. When using a custom OpenAI-compatible endpoint,
	set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`).

	### Gemini embeddings (native)

	Set the provider to `gemini` to use the Gemini embeddings API directly:

	```json5
	agents: {
	defaults: {
	memorySearch: {
	provider: "gemini",
	model: "gemini-embedding-001",
	remote: {
	apiKey: "YOUR_GEMINI_API_KEY"
	}
	}
	}
	}
	```

	Notes:
	- `remote.baseUrl` is optional (defaults to the Gemini API base URL).
	- `remote.headers` lets you add extra headers if needed.
	- Default model: `gemini-embedding-001`.

	If you want to use a custom OpenAI-compatible endpoint (OpenRouter, vLLM, or a proxy),
	you can use the `remote` configuration with the OpenAI provider:

	```json5
	agents: {
	defaults: {
	memorySearch: {
	provider: "openai",
	model: "text-embedding-3-small",
	remote: {
	baseUrl: "https://api.example.com/v1/",
	apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
	headers: { "X-Custom-Header": "value" }
	}
	}
	}
	}
	```

	If you don't want to set an API key, use `memorySearch.provider = "local"` or set
	`memorySearch.fallback = "none"`.

	Fallbacks:
	- `memorySearch.fallback` can be `openai`, `gemini`, `local`, or `none`.
	- The fallback provider is only used when the primary embedding provider fails.

	Batch indexing (OpenAI + Gemini):
	- Enabled by default for OpenAI and Gemini embeddings. Set `agents.defaults.memorySearch.remote.batch.enabled = false` to disable.
	- Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed.
	- Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2).
	- Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key.
	- Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.

	Why OpenAI batch is fast + cheap:
	- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
	- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
	- See the OpenAI Batch API docs and pricing for details:
	- https://platform.openai.com/docs/api-reference/batch
	- https://platform.openai.com/pricing

	Config example:

	```json5
	agents: {
	defaults: {
	memorySearch: {
	provider: "openai",
	model: "text-embedding-3-small",
	fallback: "openai",
	remote: {
	batch: { enabled: true, concurrency: 2 }
	},
	sync: { watch: true }
	}
	}
	}
	```

	Tools:
	- `memory_search` — returns snippets with file + line ranges.
	- `memory_get` — read memory file content by path.

	Local mode:
	- Set `agents.defaults.memorySearch.provider = "local"`.
	- Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI).
	- Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback.

	### How the memory tools work

	- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/*/.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
	- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected.
	- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent.

	### What gets indexed (and when)

	- File type: Markdown only (`MEMORY.md`, `memory/*/.md`).
	- Index storage: per-agent SQLite at `~/.clawdbot/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token).
	- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync.
	- Reindex triggers: the index stores the embedding provider/model + endpoint fingerprint + chunking params. If any of those change, Moltbot automatically resets and reindexes the entire store.

	### Hybrid search (BM25 + vector)

	When enabled, Moltbot combines:
	- Vector similarity (semantic match, wording can differ)
	- BM25 keyword relevance (exact tokens like IDs, env vars, code symbols)

	If full-text search is unavailable on your platform, Moltbot falls back to vector-only search.

	#### Why hybrid?

	Vector search is great at “this means the same thing”:
	- “Mac Studio gateway host” vs “the machine running the gateway”
	- “debounce file updates” vs “avoid indexing on every write”

	But it can be weak at exact, high-signal tokens:
	- IDs (`a828e60`, `b3b9895a…`)
	- code symbols (`memorySearch.query.hybrid`)
	- error strings (“sqlite-vec unavailable”)

	BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases.
	Hybrid search is the pragmatic middle ground: use both retrieval signals so you get
	good results for both “natural language” queries and “needle in a haystack” queries.

	#### How we merge results (the current design)

	Implementation sketch:

	1) Retrieve a candidate pool from both sides:
	- Vector: top `maxResults * candidateMultiplier` by cosine similarity.
	- BM25: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better).

	2) Convert BM25 rank into a 0..1-ish score:
	- `textScore = 1 / (1 + max(0, bm25Rank))`

	3) Union candidates by chunk id and compute a weighted score:
	- `finalScore = vectorWeight * vectorScore + textWeight * textScore`

	Notes:
	- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
	- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
	- If FTS5 can’t be created, we keep vector-only search (no hard failure).

	This isn’t “IR-theory perfect”, but it’s simple, fast, and tends to improve recall/precision on real notes.
	If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization
	(min/max or z-score) before mixing.

	Config:

	```json5
	agents: {
	defaults: {
	memorySearch: {
	query: {
	hybrid: {
	enabled: true,
	vectorWeight: 0.7,
	textWeight: 0.3,
	candidateMultiplier: 4
	}
	}
	}
	}
	}
	```

	### Embedding cache

	Moltbot can cache chunk embeddings in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.

	Config:

	```json5
	agents: {
	defaults: {
	memorySearch: {
	cache: {
	enabled: true,
	maxEntries: 50000
	}
	}
	}
	}
	```

	### Session memory search (experimental)

	You can optionally index session transcripts and surface them via `memory_search`.
	This is gated behind an experimental flag.

	```json5
	agents: {
	defaults: {
	memorySearch: {
	experimental: { sessionMemory: true },
	sources: ["memory", "sessions"]
	}
	}
	}
	```

	Notes:
	- Session indexing is opt-in (off by default).
	- Session updates are debounced and indexed asynchronously once they cross delta thresholds (best-effort).
	- `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes.
	- Results still include snippets only; `memory_get` remains limited to memory files.
	- Session indexing is isolated per agent (only that agent’s session logs are indexed).
	- Session logs live on disk (`~/.clawdbot/agents/<agentId>/sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.

	Delta thresholds (defaults shown):

	```json5
	agents: {
	defaults: {
	memorySearch: {
	sync: {
	sessions: {
	deltaBytes: 100000, // ~100 KB
	deltaMessages: 50 // JSONL lines
	}
	}
	}
	}
	}
	```

	### SQLite vector acceleration (sqlite-vec)

	When the sqlite-vec extension is available, Moltbot stores embeddings in a
	SQLite virtual table (`vec0`) and performs vector distance queries in the
	database. This keeps search fast without loading every embedding into JS.

	Configuration (optional):

	```json5
	agents: {
	defaults: {
	memorySearch: {
	store: {
	vector: {
	enabled: true,
	extensionPath: "/path/to/sqlite-vec"
	}
	}
	}
	}
	}
	```

	Notes:
	- `enabled` defaults to true; when disabled, search falls back to in-process
	cosine similarity over stored embeddings.
	- If the sqlite-vec extension is missing or fails to load, Moltbot logs the
	error and continues with the JS fallback (no vector table).
	- `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds
	or non-standard install locations).

	### Local embedding auto-download

	- Default local embedding model: `hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf` (~0.6 GB).
	- When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it auto-downloads to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry.
	- Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`.
	- Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason.

	### Custom OpenAI-compatible endpoint example

	```json5
	agents: {
	defaults: {
	memorySearch: {
	provider: "openai",
	model: "text-embedding-3-small",
	remote: {
	baseUrl: "https://api.example.com/v1/",
	apiKey: "YOUR_REMOTE_API_KEY",
	headers: {
	"X-Organization": "org-id",
	"X-Project": "project-id"
	}
	}
	}
	}
	}
	```

	Notes:
	- `remote.` takes precedence over `models.providers.openai.`.
	- `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults.