Spaces:
Running
Running
| --- | |
| # Conversation Memory System | |
| ## Purpose | |
| Long-running chats can easily exceed the model context window. MiniSearch addresses this by keeping a rolling, extractive summary of prior turns and only feeding the freshest messages into the model alongside that summary. All context handling happens locally in the browser to preserve privacy. @client/modules/textGeneration.ts#262-370 | |
| ## Components | |
| 1. **Token Budgeting** – `generateChatResponse` measures the system prompt and a stub "Ok!" assistant reply, then caps the rest of the user/assistant turns at 75% of the default 4096-token window (≈3072 tokens) to leave headroom for the response. A GPT tokenizer keeps count per message before inclusion. @client/modules/textGeneration.ts#262-303 @client/modules/textGenerationUtilities.ts#13-74 | |
| 2. **Rolling Summary Storage** – The latest summary plus a conversation identifier live in a lightweight pub/sub store so any component can read/write without prop drilling. @client/modules/pubSub.ts#249-268 | |
| 3. **Summarization Engine** – When older turns must be dropped, `createLlmSummary` asks the configured inference backend (OpenAI, AI Horde, internal API, WebLLM, or Wllama) to condense the removed messages under an 800-token limit. If the LLM call fails, the system falls back to an extractive tokenizer-based summarizer to guarantee progress. @client/modules/textGeneration.ts#66-177 | |
| 4. **Persistence Hooks** – After a search run completes, `saveLlmResponseForQuery` stores the assistant reply in IndexedDB so history restores can reload it. The conversation summary itself stays in-memory and resets whenever a new search run begins. @client/modules/history.ts#288-333 @client/modules/textGeneration.ts#179-247 | |
| ## Flow | |
| 1. User sends a chat message. | |
| 2. System prompt is regenerated by `getSystemPrompt` and augmented with any stored summary (`Conversation context: ...`). @client/modules/textGeneration.ts#270-329 | |
| 3. Recent turns are appended until the budget is exhausted; older ones become "dropped messages". | |
| 4. Dropped messages are summarized and the digest is saved back to the pub/sub store with the current conversation ID. @client/modules/textGeneration.ts#313-330 | |
| 5. The final prompt sent to the model always starts with the refreshed system prompt, followed by the stub assistant reply and the kept turn list to encourage immediate streaming. | |
| ## Settings & Extensibility | |
| - All inference types share the same summarization contract—no provider-specific logic beyond selecting the backend module at runtime. @client/modules/textGeneration.ts#95-135 | |
| - Changing the global context window (e.g., via OpenAI settings) automatically affects the available budget because the logic derives from the default context size exported by `textGenerationUtilities`. @client/modules/textGenerationUtilities.ts#13-74 | |
| - Future settings (e.g., toggling memory or adjusting the 75% ratio) should hook into the same budgeting helpers to keep behavior predictable. | |
| ## Failure Modes & Logging | |
| - Every summarization attempt is wrapped in try/catch; failures emit `addLogEntry` notifications and fall back to extractive summaries so the chat loop never stalls. @client/modules/textGeneration.ts#97-138 | |
| - If generation is interrupted (user stop), a custom `ChatGenerationError` ensures the loop exits gracefully without corrupting the stored summary. @client/modules/textGeneration.ts#360-369 @client/modules/textGenerationUtilities.ts#19-26 | |
| ## Reset Rules | |
| - Starting a new top-level search clears the summary, chat history, and cached results to avoid context leakage across unrelated conversations. @client/modules/textGeneration.ts#179-207 | |
| - Restoring a run from history repopulates chat state from IndexedDB; the memory system will rebuild summaries on demand once the user resumes chatting. @client/modules/history.ts#335-365 @client/hooks/useHistoryRestore.ts#32-105 | |
| ## Related Topics | |
| - **AI Integration**: `docs/ai-integration.md` - Detailed inference options | |
| - **Search History**: `docs/search-history.md` - History and persistence | |
| - **Overview**: `docs/overview.md` - System architecture | |
| - **Configuration**: `docs/configuration.md` - Settings for context window | |