Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
title: Vaultwise Knowledge
emoji: 📚
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: mit
Vaultwise -- Knowledge Management Platform
Interactive demo for Vaultwise, a knowledge management platform with document ingestion, vector search, AI-powered Q&A, training generation, and analytics.
Vaultwise is a full-stack application (FastAPI + React) designed for teams that need to organize, search, and learn from their internal knowledge base. This demo showcases the core search and analytics capabilities using a built-in 30-article corpus for a fictional SaaS company.
Demo Tabs
| Tab | What It Does |
|---|---|
| Knowledge Search | TF-IDF vector search over 30 knowledge base articles. Enter a query, get ranked results with relevance scores and highlighted matching terms. |
| AI Q&A | Natural language question answering grounded in the knowledge base. Finds the best-matching article via TF-IDF, then generates an answer with source citation and relevant excerpt. |
| Training Generator | Select any article to auto-generate a training module: learning objectives, structured content outline, and a 5-question multiple-choice quiz. |
| Knowledge Gap Analytics | Dashboard with article distribution by category, freshness scores, view counts, and search query frequency analysis. |
Search Algorithm
The TF-IDF search engine is implemented from scratch using only Python and numpy -- no sklearn, no external NLP libraries.
How It Works
1. Tokenization
Input text is lowercased, punctuation-stripped, and split into tokens. A stop word list filters out common English words that carry no semantic weight.
2. Term Frequency (TF)
Uses augmented term frequency to prevent bias toward longer documents:
TF(t, d) = 0.5 + 0.5 * (count(t, d) / max_count(d))
3. Inverse Document Frequency (IDF)
Measures how rare a term is across the corpus. Terms appearing in fewer documents receive higher weight:
IDF(t) = log(N / (1 + df(t)))
Where N is the total number of documents and df(t) is the number of documents containing term t. The +1 smoothing prevents division by zero.
4. TF-IDF Weight
The final weight for each term in each document:
W(t, d) = TF(t, d) * IDF(t)
5. Cosine Similarity
Queries are converted to TF-IDF vectors using the same vocabulary and IDF values. Ranking uses cosine similarity between the query vector and each document vector:
similarity(q, d) = (q . d) / (||q|| * ||d||)
This measures the angle between vectors, making it independent of document length.
Architecture (Full Platform)
Frontend (React + Vite)
|
v
API Gateway (FastAPI)
|
+-- Document Ingestion Pipeline
| PDF, HTML, Markdown parsing
| Chunking and metadata extraction
|
+-- Search Engine
| TF-IDF vectorization
| Cosine similarity ranking
| Query expansion and filtering
|
+-- AI Q&A Module
| Context retrieval via search
| LLM-powered answer generation
| Source citation and grounding
|
+-- Training Generator
| Article analysis
| Outline and quiz generation
| Learning objective extraction
|
+-- Analytics Engine
Usage tracking
Freshness scoring
Gap identification
Running Locally
pip install gradio numpy matplotlib
python app.py
Links
- Source code: github.com/dbhavery/vaultwise
- Author: Don Havery