Spaces:

Felladrin
/

MiniSearch

Running

App Files Files Community

MiniSearch / docs /conversation-memory.md

github-actions[bot]

Sync from https://github.com/felladrin/MiniSearch

10d1fd4 9 days ago

preview code

raw

history blame contribute delete

4.18 kB

Conversation Memory System

Purpose

Long-running chats can easily exceed the model context window. MiniSearch addresses this by keeping a rolling, extractive summary of prior turns and only feeding the freshest messages into the model alongside that summary. All context handling happens locally in the browser to preserve privacy. @client/modules/textGeneration.ts#262-370

Components

Token Budgeting – generateChatResponse measures the system prompt and a stub "Ok!" assistant reply, then caps the rest of the user/assistant turns at 75% of the default 4096-token window (≈3072 tokens) to leave headroom for the response. A GPT tokenizer keeps count per message before inclusion. @client/modules/textGeneration.ts#262-303 @client/modules/textGenerationUtilities.ts#13-74
Rolling Summary Storage – The latest summary plus a conversation identifier live in a lightweight pub/sub store so any component can read/write without prop drilling. @client/modules/pubSub.ts#249-268
Summarization Engine – When older turns must be dropped, createLlmSummary asks the configured inference backend (OpenAI, AI Horde, internal API, WebLLM, or Wllama) to condense the removed messages under an 800-token limit. If the LLM call fails, the system falls back to an extractive tokenizer-based summarizer to guarantee progress. @client/modules/textGeneration.ts#66-177
Persistence Hooks – After a search run completes, saveLlmResponseForQuery stores the assistant reply in IndexedDB so history restores can reload it. The conversation summary itself stays in-memory and resets whenever a new search run begins. @client/modules/history.ts#288-333 @client/modules/textGeneration.ts#179-247

Flow

User sends a chat message.
System prompt is regenerated by getSystemPrompt and augmented with any stored summary (Conversation context: ...). @client/modules/textGeneration.ts#270-329
Recent turns are appended until the budget is exhausted; older ones become "dropped messages".
Dropped messages are summarized and the digest is saved back to the pub/sub store with the current conversation ID. @client/modules/textGeneration.ts#313-330
The final prompt sent to the model always starts with the refreshed system prompt, followed by the stub assistant reply and the kept turn list to encourage immediate streaming.

Settings & Extensibility

All inference types share the same summarization contract—no provider-specific logic beyond selecting the backend module at runtime. @client/modules/textGeneration.ts#95-135
Changing the global context window (e.g., via OpenAI settings) automatically affects the available budget because the logic derives from the default context size exported by textGenerationUtilities. @client/modules/textGenerationUtilities.ts#13-74
Future settings (e.g., toggling memory or adjusting the 75% ratio) should hook into the same budgeting helpers to keep behavior predictable.

Failure Modes & Logging

Every summarization attempt is wrapped in try/catch; failures emit addLogEntry notifications and fall back to extractive summaries so the chat loop never stalls. @client/modules/textGeneration.ts#97-138
If generation is interrupted (user stop), a custom ChatGenerationError ensures the loop exits gracefully without corrupting the stored summary. @client/modules/textGeneration.ts#360-369 @client/modules/textGenerationUtilities.ts#19-26

Reset Rules

Starting a new top-level search clears the summary, chat history, and cached results to avoid context leakage across unrelated conversations. @client/modules/textGeneration.ts#179-207
Restoring a run from history repopulates chat state from IndexedDB; the memory system will rebuild summaries on demand once the user resumes chatting. @client/modules/history.ts#335-365 @client/hooks/useHistoryRestore.ts#32-105