Spaces:

Felladrin
/

MiniSearch

Running

App Files Files Community

MiniSearch / docs /glossary.md

github-actions[bot]

Sync from https://github.com/felladrin/MiniSearch

9cacba2 4 days ago

preview code

Raw

History Blame Contribute Delete

7.12 kB

	# Glossary

	Codebase-specific terms, jargon, and domain concepts used in MiniSearch.

	## Core System Concepts

	### Search Token & Hash

	A security mechanism used to authorize communication between the client and the internal search/AI endpoints.

	- Search Token: A string generated at build time (`VITE_SEARCH_TOKEN`). Used to verify that requests to the server originate from a trusted build.
	- Search Token Hash: To avoid exposing the raw token in all requests, the client generates a hash of the token. Managed via the `lastSearchTokenHashPubSub` channel.
	- Verification: The server verifies these tokens to prevent unauthorized access to the search API. Stored in `server/verifiedTokens.ts` as an in-memory `Set<string>`.

	### Inference Types

	MiniSearch supports multiple backends for Large Language Model (LLM) inference, configured via `inferenceType` in the application settings.

	\| Type \| Description \| Implementation \|
	\|------\|-------------\|----------------\|
	\| `browser` \| Local inference using WASM (Wllama) \| Client-side, privacy-preserving \|
	\| `openai` \| Connection to any OpenAI-compatible external API \| Requires API key \|
	\| `horde` \| Crowdsourced inference via the AI Horde network \| Distributed, anonymous or authenticated \|
	\| `internal` \| Server-side proxy using pre-configured credentials \| API key hidden from client \|

	### PubSub (State Management)

	Instead of a heavy state management library like Redux, MiniSearch uses a minimalist Publish-Subscribe pattern powered by the `create-pubsub` library.

	- Data Flow: Components subscribe to "channels" (e.g., `queryPubSub`, `responsePubSub`)
	- Tuple Pattern: Each channel is a 3-element tuple: `[update, subscribe, get]`
	- Persistence: Some channels use `createLocalStoragePubSub` to automatically sync state with `localStorage`
	- Throttling: UI-heavy updates like AI response streaming are throttled to ~12 updates/sec using `throttleit`

	### Reranker

	A secondary search stage that takes initial results from SearXNG and re-orders them based on relevance to the query using a cross-encoder model (`jina-reranker-v1-tiny-en`) running on a local `llama-server` instance.

	- Implementation: Spawns `llama-server` child process with `--reranking` and `--pooling rank` flags
	- Health Check: Polls `/health` endpoint via `getRerankerStatus`
	- Scoring: Results filtered using standard deviation thresholds (`kStandardDeviationFactor = 0.3`)
	- Fallback: If reranker is unhealthy, returns unranked SearXNG results

	### Wllama

	A WebAssembly (WASM) based integration of `llama.cpp` for running LLMs on the CPU in the browser.

	- Initialization: Loads models from HuggingFace using `initializeWllama`
	- Warmup: Includes a warmup phase with a single token completion using `n_threads: 1`
	- OPFS: Uses the Origin Private File System via Wllama's cache manager to store model shards locally
	- Models: GGUF format, Q4_K_S or UD-Q4_K_XL quantized, stored at `Felladrin/gguf-sharded-*` on HuggingFace

	### AI Horde

	A crowdsourced distributed cluster of workers providing AI inference. MiniSearch integrates with it using a polling strategy against the `/generate/text/status` endpoint.

	- Kudos: Virtual currency used by the Horde. Default anonymous key is `0000000000`
	- Polling: Requests sent to async API, status checked periodically until completion
	- Cancellation: Can abort generation via `DELETE` on the status endpoint

	### Conversation Memory & Rolling Summary

	A mechanism to handle long chats that exceed the LLM context window.

	- Summarization: When older messages are dropped, `createLlmSummary` asks the LLM to condense them under a limit of 800 tokens
	- Extractive Fallback: If LLM summarization fails, `summarizeDroppedMessages` uses a token-counting extractive approach
	- Token Budget: Computed based on `openAiContextLength` setting and current message count

	## Technical Jargon & Abbreviations

	### SearXNG

	A privacy-respecting metasearch engine that aggregates results from multiple search engines without tracking. Runs locally on port 8888 within the Docker container.

	### GGUF

	GGML Universal File format. Binary format for storing LLM weights, optimized for fast loading and inference. Used by Wllama and llama-server.

	### Dexie

	A minimalist wrapper for IndexedDB used for client-side persistence. MiniSearch uses two Dexie databases:
	- SearchCacheDatabase: Temporary cache with TTL-based expiration
	- HistoryDatabase: Long-term search history with retention policies

	### Vite Server Hooks

	Middleware registered via Vite plugin hooks (`configureServer`, `configurePreviewServer`). All server-side logic in MiniSearch is implemented as hooks:

	\| Hook \| Purpose \|
	\|------\|---------\|
	\| `compressionServerHook` \| gzip/brotli compression \|
	\| `crossOriginServerHook` \| COOP/COEP headers for SharedArrayBuffer \|
	\| `searchEndpointServerHook` \| `/search/text` and `/search/images` endpoints \|
	\| `statusEndpointServerHook` \| `/status` health check \|
	\| `cacheServerHook` \| Cache-Control headers \|
	\| `validateAccessKeyServerHook` \| Access key validation \|
	\| `internalApiEndpointServerHook` \| `/inference` proxy \|
	\| `rerankerServiceHook` \| llama-server lifecycle management \|

	### Circuit Breaker

	A resilience pattern used in `webSearchService.ts` to handle SearXNG service degradation. Opens after 5 consecutive failures, blocking requests for 60 seconds before attempting reset.

	### LRU Pruning

	Least Recently Used cache eviction strategy. The search cache prunes oldest entries every 10 writes when `MAX_ENTRIES` (100) is reached.

	### Argon2id

	A password hashing algorithm used for access key validation. Client hashes the access key before transmission; server verifies against configured keys.

	## Data Structures

	### SearchCacheDatabase Schema

	\| Store \| Primary Key \| Indexed Field \| Entry Type \|
	\|-------\|-------------\|---------------\|------------\|
	\| `textSearchHistory` \| key (hash) \| timestamp \| TextSearchCache \|
	\| `imageSearchHistory` \| key (hash) \| timestamp \| ImageSearchCache \|

	### HistoryDatabase Schema

	\| Table \| Purpose \|
	\|-------\|---------\|
	\| `searches` \| Canonical log of each query with hydrated results payloads \|
	\| `llmResponses` \| AI answers tied to their originating search run \|
	\| `chatHistory` \| Chronological chat turns scoped by `conversationId` \|

	### PubSub Channel Types

	\| Channel \| Data Type \| Persistence \|
	\|---------\|-----------\|-------------\|
	\| `queryPubSub` \| `string` \| Memory \|
	\| `responsePubSub` \| `string` \| Memory (throttled) \|
	\| `settingsPubSub` \| `Settings` \| localStorage \|
	\| `textSearchResultsPubSub` \| `TextSearchResults` \| Memory \|
	\| `textGenerationStatePubSub` \| `TextGenerationState` \| Memory \|
	\| `chatMessagesPubSub` \| `ChatMessage[]` \| Memory \|
	\| `conversationSummaryPubSub` \| `{id, summary}` \| Memory \|

	## Related Topics

	- Overview: `docs/overview.md` - System architecture
	- Configuration: `docs/configuration.md` - Environment variables and settings
	- UI Components: `docs/ui-components.md` - Component architecture
	- Reranking: `docs/reranking.md` - Reranker subsystem

	# Glossary

	Codebase-specific terms, jargon, and domain concepts used in MiniSearch.

	## Core System Concepts

	### Search Token & Hash

	A security mechanism used to authorize communication between the client and the internal search/AI endpoints.

	- Search Token: A string generated at build time (`VITE_SEARCH_TOKEN`). Used to verify that requests to the server originate from a trusted build.
	- Search Token Hash: To avoid exposing the raw token in all requests, the client generates a hash of the token. Managed via the `lastSearchTokenHashPubSub` channel.
	- Verification: The server verifies these tokens to prevent unauthorized access to the search API. Stored in `server/verifiedTokens.ts` as an in-memory `Set<string>`.

	### Inference Types

	MiniSearch supports multiple backends for Large Language Model (LLM) inference, configured via `inferenceType` in the application settings.

	\| Type \| Description \| Implementation \|
	\|------\|-------------\|----------------\|
	\| `browser` \| Local inference using WASM (Wllama) \| Client-side, privacy-preserving \|
	\| `openai` \| Connection to any OpenAI-compatible external API \| Requires API key \|
	\| `horde` \| Crowdsourced inference via the AI Horde network \| Distributed, anonymous or authenticated \|
	\| `internal` \| Server-side proxy using pre-configured credentials \| API key hidden from client \|

	### PubSub (State Management)

	Instead of a heavy state management library like Redux, MiniSearch uses a minimalist Publish-Subscribe pattern powered by the `create-pubsub` library.

	- Data Flow: Components subscribe to "channels" (e.g., `queryPubSub`, `responsePubSub`)
	- Tuple Pattern: Each channel is a 3-element tuple: `[update, subscribe, get]`
	- Persistence: Some channels use `createLocalStoragePubSub` to automatically sync state with `localStorage`
	- Throttling: UI-heavy updates like AI response streaming are throttled to ~12 updates/sec using `throttleit`

	### Reranker

	A secondary search stage that takes initial results from SearXNG and re-orders them based on relevance to the query using a cross-encoder model (`jina-reranker-v1-tiny-en`) running on a local `llama-server` instance.

	- Implementation: Spawns `llama-server` child process with `--reranking` and `--pooling rank` flags
	- Health Check: Polls `/health` endpoint via `getRerankerStatus`
	- Scoring: Results filtered using standard deviation thresholds (`kStandardDeviationFactor = 0.3`)
	- Fallback: If reranker is unhealthy, returns unranked SearXNG results

	### Wllama

	A WebAssembly (WASM) based integration of `llama.cpp` for running LLMs on the CPU in the browser.

	- Initialization: Loads models from HuggingFace using `initializeWllama`
	- Warmup: Includes a warmup phase with a single token completion using `n_threads: 1`
	- OPFS: Uses the Origin Private File System via Wllama's cache manager to store model shards locally
	- Models: GGUF format, Q4_K_S or UD-Q4_K_XL quantized, stored at `Felladrin/gguf-sharded-*` on HuggingFace

	### AI Horde

	A crowdsourced distributed cluster of workers providing AI inference. MiniSearch integrates with it using a polling strategy against the `/generate/text/status` endpoint.

	- Kudos: Virtual currency used by the Horde. Default anonymous key is `0000000000`
	- Polling: Requests sent to async API, status checked periodically until completion
	- Cancellation: Can abort generation via `DELETE` on the status endpoint

	### Conversation Memory & Rolling Summary

	A mechanism to handle long chats that exceed the LLM context window.

	- Summarization: When older messages are dropped, `createLlmSummary` asks the LLM to condense them under a limit of 800 tokens
	- Extractive Fallback: If LLM summarization fails, `summarizeDroppedMessages` uses a token-counting extractive approach
	- Token Budget: Computed based on `openAiContextLength` setting and current message count

	## Technical Jargon & Abbreviations

	### SearXNG

	A privacy-respecting metasearch engine that aggregates results from multiple search engines without tracking. Runs locally on port 8888 within the Docker container.

	### GGUF

	GGML Universal File format. Binary format for storing LLM weights, optimized for fast loading and inference. Used by Wllama and llama-server.

	### Dexie

	A minimalist wrapper for IndexedDB used for client-side persistence. MiniSearch uses two Dexie databases:
	- SearchCacheDatabase: Temporary cache with TTL-based expiration
	- HistoryDatabase: Long-term search history with retention policies

	### Vite Server Hooks

	Middleware registered via Vite plugin hooks (`configureServer`, `configurePreviewServer`). All server-side logic in MiniSearch is implemented as hooks:

	\| Hook \| Purpose \|
	\|------\|---------\|
	\| `compressionServerHook` \| gzip/brotli compression \|
	\| `crossOriginServerHook` \| COOP/COEP headers for SharedArrayBuffer \|
	\| `searchEndpointServerHook` \| `/search/text` and `/search/images` endpoints \|
	\| `statusEndpointServerHook` \| `/status` health check \|
	\| `cacheServerHook` \| Cache-Control headers \|
	\| `validateAccessKeyServerHook` \| Access key validation \|
	\| `internalApiEndpointServerHook` \| `/inference` proxy \|
	\| `rerankerServiceHook` \| llama-server lifecycle management \|

	### Circuit Breaker

	A resilience pattern used in `webSearchService.ts` to handle SearXNG service degradation. Opens after 5 consecutive failures, blocking requests for 60 seconds before attempting reset.

	### LRU Pruning

	Least Recently Used cache eviction strategy. The search cache prunes oldest entries every 10 writes when `MAX_ENTRIES` (100) is reached.

	### Argon2id

	A password hashing algorithm used for access key validation. Client hashes the access key before transmission; server verifies against configured keys.

	## Data Structures

	### SearchCacheDatabase Schema

	\| Store \| Primary Key \| Indexed Field \| Entry Type \|
	\|-------\|-------------\|---------------\|------------\|
	\| `textSearchHistory` \| key (hash) \| timestamp \| TextSearchCache \|
	\| `imageSearchHistory` \| key (hash) \| timestamp \| ImageSearchCache \|

	### HistoryDatabase Schema

	\| Table \| Purpose \|
	\|-------\|---------\|
	\| `searches` \| Canonical log of each query with hydrated results payloads \|
	\| `llmResponses` \| AI answers tied to their originating search run \|
	\| `chatHistory` \| Chronological chat turns scoped by `conversationId` \|

	### PubSub Channel Types

	\| Channel \| Data Type \| Persistence \|
	\|---------\|-----------\|-------------\|
	\| `queryPubSub` \| `string` \| Memory \|
	\| `responsePubSub` \| `string` \| Memory (throttled) \|
	\| `settingsPubSub` \| `Settings` \| localStorage \|
	\| `textSearchResultsPubSub` \| `TextSearchResults` \| Memory \|
	\| `textGenerationStatePubSub` \| `TextGenerationState` \| Memory \|
	\| `chatMessagesPubSub` \| `ChatMessage[]` \| Memory \|
	\| `conversationSummaryPubSub` \| `{id, summary}` \| Memory \|

	## Related Topics

	- Overview: `docs/overview.md` - System architecture
	- Configuration: `docs/configuration.md` - Environment variables and settings
	- UI Components: `docs/ui-components.md` - Component architecture
	- Reranking: `docs/reranking.md` - Reranker subsystem