--- title: Vaultwise Knowledge emoji: "\U0001F4DA" colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 5.29.0 app_file: app.py pinned: false license: mit --- # Vaultwise -- Knowledge Management Platform **Interactive demo for [Vaultwise](https://github.com/dbhavery/vaultwise), a knowledge management platform with document ingestion, vector search, AI-powered Q&A, training generation, and analytics.** Vaultwise is a full-stack application (FastAPI + React) designed for teams that need to organize, search, and learn from their internal knowledge base. This demo showcases the core search and analytics capabilities using a built-in 30-article corpus for a fictional SaaS company. ## Demo Tabs | Tab | What It Does | |-----|--------------| | **Knowledge Search** | TF-IDF vector search over 30 knowledge base articles. Enter a query, get ranked results with relevance scores and highlighted matching terms. | | **AI Q&A** | Natural language question answering grounded in the knowledge base. Finds the best-matching article via TF-IDF, then generates an answer with source citation and relevant excerpt. | | **Training Generator** | Select any article to auto-generate a training module: learning objectives, structured content outline, and a 5-question multiple-choice quiz. | | **Knowledge Gap Analytics** | Dashboard with article distribution by category, freshness scores, view counts, and search query frequency analysis. | ## Search Algorithm The TF-IDF search engine is implemented from scratch using only Python and numpy -- no sklearn, no external NLP libraries. ### How It Works **1. Tokenization** Input text is lowercased, punctuation-stripped, and split into tokens. A stop word list filters out common English words that carry no semantic weight. **2. Term Frequency (TF)** Uses augmented term frequency to prevent bias toward longer documents: ``` TF(t, d) = 0.5 + 0.5 * (count(t, d) / max_count(d)) ``` **3. Inverse Document Frequency (IDF)** Measures how rare a term is across the corpus. Terms appearing in fewer documents receive higher weight: ``` IDF(t) = log(N / (1 + df(t))) ``` Where N is the total number of documents and df(t) is the number of documents containing term t. The +1 smoothing prevents division by zero. **4. TF-IDF Weight** The final weight for each term in each document: ``` W(t, d) = TF(t, d) * IDF(t) ``` **5. Cosine Similarity** Queries are converted to TF-IDF vectors using the same vocabulary and IDF values. Ranking uses cosine similarity between the query vector and each document vector: ``` similarity(q, d) = (q . d) / (||q|| * ||d||) ``` This measures the angle between vectors, making it independent of document length. ### Architecture (Full Platform) ``` Frontend (React + Vite) | v API Gateway (FastAPI) | +-- Document Ingestion Pipeline | PDF, HTML, Markdown parsing | Chunking and metadata extraction | +-- Search Engine | TF-IDF vectorization | Cosine similarity ranking | Query expansion and filtering | +-- AI Q&A Module | Context retrieval via search | LLM-powered answer generation | Source citation and grounding | +-- Training Generator | Article analysis | Outline and quiz generation | Learning objective extraction | +-- Analytics Engine Usage tracking Freshness scoring Gap identification ``` ## Running Locally ```bash pip install gradio numpy matplotlib python app.py ``` ## Links - **Source code:** [github.com/dbhavery/vaultwise](https://github.com/dbhavery/vaultwise) - **Author:** [Don Havery](https://github.com/dbhavery)