Spaces:
Configuration error
Configuration error
| summary: "How Moltbot memory works (workspace files + automatic memory flush)" | |
| read_when: | |
| - You want the memory file layout and workflow | |
| - You want to tune the automatic pre-compaction memory flush | |
| # Memory | |
| Moltbot memory is **plain Markdown in the agent workspace**. The files are the | |
| source of truth; the model only "remembers" what gets written to disk. | |
| Memory search tools are provided by the active memory plugin (default: | |
| `memory-core`). Disable memory plugins with `plugins.slots.memory = "none"`. | |
| ## Memory files (Markdown) | |
| The default workspace layout uses two memory layers: | |
| - `memory/YYYY-MM-DD.md` | |
| - Daily log (append-only). | |
| - Read today + yesterday at session start. | |
| - `MEMORY.md` (optional) | |
| - Curated long-term memory. | |
| - **Only load in the main, private session** (never in group contexts). | |
| These files live under the workspace (`agents.defaults.workspace`, default | |
| `~/clawd`). See [Agent workspace](/concepts/agent-workspace) for the full layout. | |
| ## When to write memory | |
| - Decisions, preferences, and durable facts go to `MEMORY.md`. | |
| - Day-to-day notes and running context go to `memory/YYYY-MM-DD.md`. | |
| - If someone says "remember this," write it down (do not keep it in RAM). | |
| - This area is still evolving. It helps to remind the model to store memories; it will know what to do. | |
| - If you want something to stick, **ask the bot to write it** into memory. | |
| ## Automatic memory flush (pre-compaction ping) | |
| When a session is **close to auto-compaction**, Moltbot triggers a **silent, | |
| agentic turn** that reminds the model to write durable memory **before** the | |
| context is compacted. The default prompts explicitly say the model *may reply*, | |
| but usually `NO_REPLY` is the correct response so the user never sees this turn. | |
| This is controlled by `agents.defaults.compaction.memoryFlush`: | |
| ```json5 | |
| { | |
| agents: { | |
| defaults: { | |
| compaction: { | |
| reserveTokensFloor: 20000, | |
| memoryFlush: { | |
| enabled: true, | |
| softThresholdTokens: 4000, | |
| systemPrompt: "Session nearing compaction. Store durable memories now.", | |
| prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store." | |
| } | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| Details: | |
| - **Soft threshold**: flush triggers when the session token estimate crosses | |
| `contextWindow - reserveTokensFloor - softThresholdTokens`. | |
| - **Silent** by default: prompts include `NO_REPLY` so nothing is delivered. | |
| - **Two prompts**: a user prompt plus a system prompt append the reminder. | |
| - **One flush per compaction cycle** (tracked in `sessions.json`). | |
| - **Workspace must be writable**: if the session runs sandboxed with | |
| `workspaceAccess: "ro"` or `"none"`, the flush is skipped. | |
| For the full compaction lifecycle, see | |
| [Session management + compaction](/reference/session-management-compaction). | |
| ## Vector memory search | |
| Moltbot can build a small vector index over `MEMORY.md` and `memory/*.md` so | |
| semantic queries can find related notes even when wording differs. | |
| Defaults: | |
| - Enabled by default. | |
| - Watches memory files for changes (debounced). | |
| - Uses remote embeddings by default. If `memorySearch.provider` is not set, Moltbot auto-selects: | |
| 1. `local` if a `memorySearch.local.modelPath` is configured and the file exists. | |
| 2. `openai` if an OpenAI key can be resolved. | |
| 3. `gemini` if a Gemini key can be resolved. | |
| 4. Otherwise memory search stays disabled until configured. | |
| - Local mode uses node-llama-cpp and may require `pnpm approve-builds`. | |
| - Uses sqlite-vec (when available) to accelerate vector search inside SQLite. | |
| Remote embeddings **require** an API key for the embedding provider. Moltbot | |
| resolves keys from auth profiles, `models.providers.*.apiKey`, or environment | |
| variables. Codex OAuth only covers chat/completions and does **not** satisfy | |
| embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or | |
| `models.providers.google.apiKey`. When using a custom OpenAI-compatible endpoint, | |
| set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`). | |
| ### Gemini embeddings (native) | |
| Set the provider to `gemini` to use the Gemini embeddings API directly: | |
| ```json5 | |
| agents: { | |
| defaults: { | |
| memorySearch: { | |
| provider: "gemini", | |
| model: "gemini-embedding-001", | |
| remote: { | |
| apiKey: "YOUR_GEMINI_API_KEY" | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| Notes: | |
| - `remote.baseUrl` is optional (defaults to the Gemini API base URL). | |
| - `remote.headers` lets you add extra headers if needed. | |
| - Default model: `gemini-embedding-001`. | |
| If you want to use a **custom OpenAI-compatible endpoint** (OpenRouter, vLLM, or a proxy), | |
| you can use the `remote` configuration with the OpenAI provider: | |
| ```json5 | |
| agents: { | |
| defaults: { | |
| memorySearch: { | |
| provider: "openai", | |
| model: "text-embedding-3-small", | |
| remote: { | |
| baseUrl: "https://api.example.com/v1/", | |
| apiKey: "YOUR_OPENAI_COMPAT_API_KEY", | |
| headers: { "X-Custom-Header": "value" } | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| If you don't want to set an API key, use `memorySearch.provider = "local"` or set | |
| `memorySearch.fallback = "none"`. | |
| Fallbacks: | |
| - `memorySearch.fallback` can be `openai`, `gemini`, `local`, or `none`. | |
| - The fallback provider is only used when the primary embedding provider fails. | |
| Batch indexing (OpenAI + Gemini): | |
| - Enabled by default for OpenAI and Gemini embeddings. Set `agents.defaults.memorySearch.remote.batch.enabled = false` to disable. | |
| - Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed. | |
| - Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2). | |
| - Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key. | |
| - Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability. | |
| Why OpenAI batch is fast + cheap: | |
| - For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously. | |
| - OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously. | |
| - See the OpenAI Batch API docs and pricing for details: | |
| - https://platform.openai.com/docs/api-reference/batch | |
| - https://platform.openai.com/pricing | |
| Config example: | |
| ```json5 | |
| agents: { | |
| defaults: { | |
| memorySearch: { | |
| provider: "openai", | |
| model: "text-embedding-3-small", | |
| fallback: "openai", | |
| remote: { | |
| batch: { enabled: true, concurrency: 2 } | |
| }, | |
| sync: { watch: true } | |
| } | |
| } | |
| } | |
| ``` | |
| Tools: | |
| - `memory_search` — returns snippets with file + line ranges. | |
| - `memory_get` — read memory file content by path. | |
| Local mode: | |
| - Set `agents.defaults.memorySearch.provider = "local"`. | |
| - Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI). | |
| - Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback. | |
| ### How the memory tools work | |
| - `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned. | |
| - `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected. | |
| - Both tools are enabled only when `memorySearch.enabled` resolves true for the agent. | |
| ### What gets indexed (and when) | |
| - File type: Markdown only (`MEMORY.md`, `memory/**/*.md`). | |
| - Index storage: per-agent SQLite at `~/.clawdbot/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token). | |
| - Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync. | |
| - Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, Moltbot automatically resets and reindexes the entire store. | |
| ### Hybrid search (BM25 + vector) | |
| When enabled, Moltbot combines: | |
| - **Vector similarity** (semantic match, wording can differ) | |
| - **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols) | |
| If full-text search is unavailable on your platform, Moltbot falls back to vector-only search. | |
| #### Why hybrid? | |
| Vector search is great at “this means the same thing”: | |
| - “Mac Studio gateway host” vs “the machine running the gateway” | |
| - “debounce file updates” vs “avoid indexing on every write” | |
| But it can be weak at exact, high-signal tokens: | |
| - IDs (`a828e60`, `b3b9895a…`) | |
| - code symbols (`memorySearch.query.hybrid`) | |
| - error strings (“sqlite-vec unavailable”) | |
| BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases. | |
| Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get | |
| good results for both “natural language” queries and “needle in a haystack” queries. | |
| #### How we merge results (the current design) | |
| Implementation sketch: | |
| 1) Retrieve a candidate pool from both sides: | |
| - **Vector**: top `maxResults * candidateMultiplier` by cosine similarity. | |
| - **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better). | |
| 2) Convert BM25 rank into a 0..1-ish score: | |
| - `textScore = 1 / (1 + max(0, bm25Rank))` | |
| 3) Union candidates by chunk id and compute a weighted score: | |
| - `finalScore = vectorWeight * vectorScore + textWeight * textScore` | |
| Notes: | |
| - `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages. | |
| - If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches. | |
| - If FTS5 can’t be created, we keep vector-only search (no hard failure). | |
| This isn’t “IR-theory perfect”, but it’s simple, fast, and tends to improve recall/precision on real notes. | |
| If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization | |
| (min/max or z-score) before mixing. | |
| Config: | |
| ```json5 | |
| agents: { | |
| defaults: { | |
| memorySearch: { | |
| query: { | |
| hybrid: { | |
| enabled: true, | |
| vectorWeight: 0.7, | |
| textWeight: 0.3, | |
| candidateMultiplier: 4 | |
| } | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| ### Embedding cache | |
| Moltbot can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text. | |
| Config: | |
| ```json5 | |
| agents: { | |
| defaults: { | |
| memorySearch: { | |
| cache: { | |
| enabled: true, | |
| maxEntries: 50000 | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| ### Session memory search (experimental) | |
| You can optionally index **session transcripts** and surface them via `memory_search`. | |
| This is gated behind an experimental flag. | |
| ```json5 | |
| agents: { | |
| defaults: { | |
| memorySearch: { | |
| experimental: { sessionMemory: true }, | |
| sources: ["memory", "sessions"] | |
| } | |
| } | |
| } | |
| ``` | |
| Notes: | |
| - Session indexing is **opt-in** (off by default). | |
| - Session updates are debounced and **indexed asynchronously** once they cross delta thresholds (best-effort). | |
| - `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes. | |
| - Results still include snippets only; `memory_get` remains limited to memory files. | |
| - Session indexing is isolated per agent (only that agent’s session logs are indexed). | |
| - Session logs live on disk (`~/.clawdbot/agents/<agentId>/sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts. | |
| Delta thresholds (defaults shown): | |
| ```json5 | |
| agents: { | |
| defaults: { | |
| memorySearch: { | |
| sync: { | |
| sessions: { | |
| deltaBytes: 100000, // ~100 KB | |
| deltaMessages: 50 // JSONL lines | |
| } | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| ### SQLite vector acceleration (sqlite-vec) | |
| When the sqlite-vec extension is available, Moltbot stores embeddings in a | |
| SQLite virtual table (`vec0`) and performs vector distance queries in the | |
| database. This keeps search fast without loading every embedding into JS. | |
| Configuration (optional): | |
| ```json5 | |
| agents: { | |
| defaults: { | |
| memorySearch: { | |
| store: { | |
| vector: { | |
| enabled: true, | |
| extensionPath: "/path/to/sqlite-vec" | |
| } | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| Notes: | |
| - `enabled` defaults to true; when disabled, search falls back to in-process | |
| cosine similarity over stored embeddings. | |
| - If the sqlite-vec extension is missing or fails to load, Moltbot logs the | |
| error and continues with the JS fallback (no vector table). | |
| - `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds | |
| or non-standard install locations). | |
| ### Local embedding auto-download | |
| - Default local embedding model: `hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf` (~0.6 GB). | |
| - When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry. | |
| - Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`. | |
| - Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason. | |
| ### Custom OpenAI-compatible endpoint example | |
| ```json5 | |
| agents: { | |
| defaults: { | |
| memorySearch: { | |
| provider: "openai", | |
| model: "text-embedding-3-small", | |
| remote: { | |
| baseUrl: "https://api.example.com/v1/", | |
| apiKey: "YOUR_REMOTE_API_KEY", | |
| headers: { | |
| "X-Organization": "org-id", | |
| "X-Project": "project-id" | |
| } | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| Notes: | |
| - `remote.*` takes precedence over `models.providers.openai.*`. | |
| - `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults. | |