---
title: Vaultwise Knowledge
emoji: "\U0001F4DA"
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: mit
---

# Vaultwise -- Knowledge Management Platform

**Interactive demo for [Vaultwise](https://github.com/dbhavery/vaultwise), a knowledge management platform with document ingestion, vector search, AI-powered Q&A, training generation, and analytics.**

Vaultwise is a full-stack application (FastAPI + React) designed for teams that need to organize, search, and learn from their internal knowledge base. This demo showcases the core search and analytics capabilities using a built-in 30-article corpus for a fictional SaaS company.

## Demo Tabs

| Tab | What It Does |
|-----|--------------|
| **Knowledge Search** | TF-IDF vector search over 30 knowledge base articles. Enter a query, get ranked results with relevance scores and highlighted matching terms. |
| **AI Q&A** | Natural language question answering grounded in the knowledge base. Finds the best-matching article via TF-IDF, then generates an answer with source citation and relevant excerpt. |
| **Training Generator** | Select any article to auto-generate a training module: learning objectives, structured content outline, and a 5-question multiple-choice quiz. |
| **Knowledge Gap Analytics** | Dashboard with article distribution by category, freshness scores, view counts, and search query frequency analysis. |

## Search Algorithm

The TF-IDF search engine is implemented from scratch using only Python and numpy -- no sklearn, no external NLP libraries.

### How It Works

**1. Tokenization**

Input text is lowercased, punctuation-stripped, and split into tokens. A stop word list filters out common English words that carry no semantic weight.

**2. Term Frequency (TF)**

Uses augmented term frequency to prevent bias toward longer documents:

```
TF(t, d) = 0.5 + 0.5 * (count(t, d) / max_count(d))
```

**3. Inverse Document Frequency (IDF)**

Measures how rare a term is across the corpus. Terms appearing in fewer documents receive higher weight:

```
IDF(t) = log(N / (1 + df(t)))
```

Where N is the total number of documents and df(t) is the number of documents containing term t. The +1 smoothing prevents division by zero.

**4. TF-IDF Weight**

The final weight for each term in each document:

```
W(t, d) = TF(t, d) * IDF(t)
```

**5. Cosine Similarity**

Queries are converted to TF-IDF vectors using the same vocabulary and IDF values. Ranking uses cosine similarity between the query vector and each document vector:

```
similarity(q, d) = (q . d) / (||q|| * ||d||)
```

This measures the angle between vectors, making it independent of document length.

### Architecture (Full Platform)

```
Frontend (React + Vite)
    |
    v
API Gateway (FastAPI)
    |
    +-- Document Ingestion Pipeline
    |       PDF, HTML, Markdown parsing
    |       Chunking and metadata extraction
    |
    +-- Search Engine
    |       TF-IDF vectorization
    |       Cosine similarity ranking
    |       Query expansion and filtering
    |
    +-- AI Q&A Module
    |       Context retrieval via search
    |       LLM-powered answer generation
    |       Source citation and grounding
    |
    +-- Training Generator
    |       Article analysis
    |       Outline and quiz generation
    |       Learning objective extraction
    |
    +-- Analytics Engine
            Usage tracking
            Freshness scoring
            Gap identification
```

## Running Locally

```bash
pip install gradio numpy matplotlib
python app.py
```

## Links

- **Source code:** [github.com/dbhavery/vaultwise](https://github.com/dbhavery/vaultwise)
- **Author:** [Don Havery](https://github.com/dbhavery)