dbhavery commited on
Commit
bc51393
·
verified ·
1 Parent(s): e101077

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +117 -10
  2. __pycache__/app.cpython-313.pyc +0 -0
  3. app.py +1575 -0
  4. requirements.txt +3 -0
README.md CHANGED
@@ -1,10 +1,117 @@
1
- ---
2
- title: Vaultwise Knowledge
3
- emoji: 📉
4
- colorFrom: pink
5
- colorTo: indigo
6
- sdk: static
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Vaultwise Knowledge
3
+ emoji: "\U0001F4DA"
4
+ colorFrom: indigo
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 5.29.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # Vaultwise -- Knowledge Management Platform
14
+
15
+ **Interactive demo for [Vaultwise](https://github.com/dbhavery/vaultwise), a knowledge management platform with document ingestion, vector search, AI-powered Q&A, training generation, and analytics.**
16
+
17
+ Vaultwise is a full-stack application (FastAPI + React) designed for teams that need to organize, search, and learn from their internal knowledge base. This demo showcases the core search and analytics capabilities using a built-in 30-article corpus for a fictional SaaS company.
18
+
19
+ ## Demo Tabs
20
+
21
+ | Tab | What It Does |
22
+ |-----|--------------|
23
+ | **Knowledge Search** | TF-IDF vector search over 30 knowledge base articles. Enter a query, get ranked results with relevance scores and highlighted matching terms. |
24
+ | **AI Q&A** | Natural language question answering grounded in the knowledge base. Finds the best-matching article via TF-IDF, then generates an answer with source citation and relevant excerpt. |
25
+ | **Training Generator** | Select any article to auto-generate a training module: learning objectives, structured content outline, and a 5-question multiple-choice quiz. |
26
+ | **Knowledge Gap Analytics** | Dashboard with article distribution by category, freshness scores, view counts, and search query frequency analysis. |
27
+
28
+ ## Search Algorithm
29
+
30
+ The TF-IDF search engine is implemented from scratch using only Python and numpy -- no sklearn, no external NLP libraries.
31
+
32
+ ### How It Works
33
+
34
+ **1. Tokenization**
35
+
36
+ Input text is lowercased, punctuation-stripped, and split into tokens. A stop word list filters out common English words that carry no semantic weight.
37
+
38
+ **2. Term Frequency (TF)**
39
+
40
+ Uses augmented term frequency to prevent bias toward longer documents:
41
+
42
+ ```
43
+ TF(t, d) = 0.5 + 0.5 * (count(t, d) / max_count(d))
44
+ ```
45
+
46
+ **3. Inverse Document Frequency (IDF)**
47
+
48
+ Measures how rare a term is across the corpus. Terms appearing in fewer documents receive higher weight:
49
+
50
+ ```
51
+ IDF(t) = log(N / (1 + df(t)))
52
+ ```
53
+
54
+ Where N is the total number of documents and df(t) is the number of documents containing term t. The +1 smoothing prevents division by zero.
55
+
56
+ **4. TF-IDF Weight**
57
+
58
+ The final weight for each term in each document:
59
+
60
+ ```
61
+ W(t, d) = TF(t, d) * IDF(t)
62
+ ```
63
+
64
+ **5. Cosine Similarity**
65
+
66
+ Queries are converted to TF-IDF vectors using the same vocabulary and IDF values. Ranking uses cosine similarity between the query vector and each document vector:
67
+
68
+ ```
69
+ similarity(q, d) = (q . d) / (||q|| * ||d||)
70
+ ```
71
+
72
+ This measures the angle between vectors, making it independent of document length.
73
+
74
+ ### Architecture (Full Platform)
75
+
76
+ ```
77
+ Frontend (React + Vite)
78
+ |
79
+ v
80
+ API Gateway (FastAPI)
81
+ |
82
+ +-- Document Ingestion Pipeline
83
+ | PDF, HTML, Markdown parsing
84
+ | Chunking and metadata extraction
85
+ |
86
+ +-- Search Engine
87
+ | TF-IDF vectorization
88
+ | Cosine similarity ranking
89
+ | Query expansion and filtering
90
+ |
91
+ +-- AI Q&A Module
92
+ | Context retrieval via search
93
+ | LLM-powered answer generation
94
+ | Source citation and grounding
95
+ |
96
+ +-- Training Generator
97
+ | Article analysis
98
+ | Outline and quiz generation
99
+ | Learning objective extraction
100
+ |
101
+ +-- Analytics Engine
102
+ Usage tracking
103
+ Freshness scoring
104
+ Gap identification
105
+ ```
106
+
107
+ ## Running Locally
108
+
109
+ ```bash
110
+ pip install gradio numpy matplotlib
111
+ python app.py
112
+ ```
113
+
114
+ ## Links
115
+
116
+ - **Source code:** [github.com/dbhavery/vaultwise](https://github.com/dbhavery/vaultwise)
117
+ - **Author:** [Don Havery](https://github.com/dbhavery)
__pycache__/app.cpython-313.pyc ADDED
Binary file (67.2 kB). View file
 
app.py ADDED
@@ -0,0 +1,1575 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Vaultwise -- Knowledge Management Platform
3
+ Interactive demo showcasing TF-IDF search, AI Q&A, training generation, and analytics.
4
+
5
+ All search functionality is implemented from scratch using numpy.
6
+ No sklearn or external NLP libraries required.
7
+ """
8
+
9
+ import math
10
+ import re
11
+ import string
12
+ from collections import Counter
13
+ from typing import Optional
14
+
15
+ import gradio as gr
16
+ import matplotlib
17
+ import matplotlib.pyplot as plt
18
+ import numpy as np
19
+
20
+ matplotlib.use("Agg")
21
+
22
+ # ---------------------------------------------------------------------------
23
+ # Constants
24
+ # ---------------------------------------------------------------------------
25
+
26
+ APP_TITLE = "Vaultwise -- Knowledge Management Platform"
27
+ ACCENT_COLOR = "#3b82f6"
28
+ TOP_K_RESULTS = 5
29
+
30
+ CATEGORIES = [
31
+ "Onboarding",
32
+ "Billing",
33
+ "API",
34
+ "Security",
35
+ "Integrations",
36
+ "Infrastructure",
37
+ "Support",
38
+ "Compliance",
39
+ ]
40
+
41
+ STOP_WORDS = frozenset(
42
+ {
43
+ "a", "an", "the", "and", "or", "but", "in", "on", "at", "to", "for",
44
+ "of", "with", "by", "from", "is", "it", "as", "are", "was", "were",
45
+ "be", "been", "being", "have", "has", "had", "do", "does", "did",
46
+ "will", "would", "could", "should", "may", "might", "shall", "can",
47
+ "this", "that", "these", "those", "i", "you", "he", "she", "we",
48
+ "they", "me", "him", "her", "us", "them", "my", "your", "his",
49
+ "its", "our", "their", "what", "which", "who", "whom", "how",
50
+ "when", "where", "why", "not", "no", "all", "each", "every",
51
+ "both", "few", "more", "most", "other", "some", "such", "than",
52
+ "too", "very", "just", "about", "if", "so", "also", "up", "out",
53
+ "into", "over", "after", "before", "between", "under", "through",
54
+ "during", "above", "below", "any", "only", "own", "same", "then",
55
+ "there", "here", "once", "while", "now", "new", "get", "use",
56
+ }
57
+ )
58
+
59
+
60
+ # ---------------------------------------------------------------------------
61
+ # Knowledge Base -- 30 articles for a fictional SaaS company "NovaCRM"
62
+ # ---------------------------------------------------------------------------
63
+
64
+ KNOWLEDGE_BASE: list[dict[str, str]] = [
65
+ # --- Onboarding ---
66
+ {
67
+ "id": "KB-001",
68
+ "title": "Getting Started with NovaCRM",
69
+ "category": "Onboarding",
70
+ "content": (
71
+ "Welcome to NovaCRM. This guide walks new users through initial account "
72
+ "setup, workspace configuration, and first-time login. After signing up, "
73
+ "you will receive a verification email. Click the link to activate your "
74
+ "account. Once logged in, navigate to Settings > Workspace to configure "
75
+ "your company name, timezone, and default currency. Invite team members "
76
+ "from the Team Management page by entering their email addresses. Each "
77
+ "new member receives an onboarding checklist that tracks their setup "
78
+ "progress through profile completion, integration connections, and first "
79
+ "deal creation."
80
+ ),
81
+ "views": 4521,
82
+ "freshness": 0.95,
83
+ },
84
+ {
85
+ "id": "KB-002",
86
+ "title": "User Roles and Permissions Overview",
87
+ "category": "Onboarding",
88
+ "content": (
89
+ "NovaCRM supports four user roles: Admin, Manager, Agent, and Viewer. "
90
+ "Admins have full system access including billing, user management, and "
91
+ "API key generation. Managers can create teams, assign leads, and view "
92
+ "team analytics. Agents can manage their own contacts, deals, and tasks. "
93
+ "Viewers have read-only access to dashboards and reports. Custom roles "
94
+ "can be created under Settings > Roles with granular permission toggles "
95
+ "for each module. Role inheritance allows child roles to automatically "
96
+ "receive parent permissions. Audit logs track all permission changes."
97
+ ),
98
+ "views": 3187,
99
+ "freshness": 0.88,
100
+ },
101
+ {
102
+ "id": "KB-003",
103
+ "title": "Importing Contacts and Data Migration",
104
+ "category": "Onboarding",
105
+ "content": (
106
+ "NovaCRM supports CSV, Excel, and vCard imports for contact migration. "
107
+ "Navigate to Contacts > Import to upload your file. The mapping wizard "
108
+ "automatically detects common fields like name, email, phone, and company. "
109
+ "For custom fields, drag and drop column headers to match your schema. "
110
+ "Duplicate detection runs automatically using email address matching with "
111
+ "configurable merge rules. For large migrations over 50,000 records, use "
112
+ "the bulk import API endpoint which processes records asynchronously and "
113
+ "sends a completion webhook. Migration history is available under Settings "
114
+ "> Data > Import History with rollback capabilities for the last 30 days."
115
+ ),
116
+ "views": 2843,
117
+ "freshness": 0.82,
118
+ },
119
+ {
120
+ "id": "KB-004",
121
+ "title": "Setting Up Your Sales Pipeline",
122
+ "category": "Onboarding",
123
+ "content": (
124
+ "The sales pipeline in NovaCRM is fully customizable. Go to Pipeline > "
125
+ "Settings to create stages. Default stages include Lead, Qualified, "
126
+ "Proposal, Negotiation, and Closed Won or Closed Lost. Each stage has "
127
+ "configurable probability percentages for revenue forecasting. Drag deals "
128
+ "between stages on the Kanban board or update them in list view. "
129
+ "Automation rules can trigger actions when deals move between stages, "
130
+ "such as sending follow-up emails, creating tasks, or notifying managers. "
131
+ "Pipeline analytics show conversion rates between stages, average deal "
132
+ "velocity, and bottleneck identification."
133
+ ),
134
+ "views": 3654,
135
+ "freshness": 0.91,
136
+ },
137
+ # --- Billing ---
138
+ {
139
+ "id": "KB-005",
140
+ "title": "Subscription Plans and Pricing",
141
+ "category": "Billing",
142
+ "content": (
143
+ "NovaCRM offers three subscription tiers: Starter at 29 dollars per user "
144
+ "per month, Professional at 79 dollars per user per month, and Enterprise "
145
+ "with custom pricing. Starter includes contact management, basic pipeline, "
146
+ "email integration, and 5 GB storage. Professional adds workflow automation, "
147
+ "advanced analytics, API access, and 50 GB storage. Enterprise includes "
148
+ "custom integrations, dedicated support, SSO, audit logs, and unlimited "
149
+ "storage. All plans include a 14-day free trial with no credit card "
150
+ "required. Annual billing provides a 20 percent discount."
151
+ ),
152
+ "views": 5102,
153
+ "freshness": 0.97,
154
+ },
155
+ {
156
+ "id": "KB-006",
157
+ "title": "Managing Invoices and Payment Methods",
158
+ "category": "Billing",
159
+ "content": (
160
+ "Access your billing dashboard at Settings > Billing > Invoices. NovaCRM "
161
+ "accepts credit cards via Stripe and bank transfers for Enterprise plans. "
162
+ "Invoices are generated on the first of each month and sent to the billing "
163
+ "email address. Download invoices as PDF from the billing history page. "
164
+ "To update payment methods, navigate to Settings > Billing > Payment "
165
+ "Methods and add a new card or bank account. Failed payments trigger "
166
+ "automatic retry on days 3, 7, and 14. After three failures, the account "
167
+ "enters a 7-day grace period before suspension. Tax ID and VAT numbers "
168
+ "can be configured for proper invoice formatting."
169
+ ),
170
+ "views": 1876,
171
+ "freshness": 0.85,
172
+ },
173
+ {
174
+ "id": "KB-007",
175
+ "title": "Upgrading and Downgrading Your Plan",
176
+ "category": "Billing",
177
+ "content": (
178
+ "Plan changes take effect immediately. When upgrading, you are charged a "
179
+ "prorated amount for the remaining billing cycle. When downgrading, the "
180
+ "new rate applies at the next billing cycle and a credit is issued for "
181
+ "the difference. Navigate to Settings > Billing > Change Plan to see "
182
+ "available options. Feature access adjusts automatically upon plan change. "
183
+ "Data retention is maintained during downgrades, but access to premium "
184
+ "features is restricted. If your current usage exceeds the new plan limits, "
185
+ "you will receive a warning with 30 days to reduce usage before enforcement."
186
+ ),
187
+ "views": 1342,
188
+ "freshness": 0.79,
189
+ },
190
+ # --- API ---
191
+ {
192
+ "id": "KB-008",
193
+ "title": "REST API Authentication and Rate Limits",
194
+ "category": "API",
195
+ "content": (
196
+ "The NovaCRM REST API uses Bearer token authentication. Generate API keys "
197
+ "at Settings > API > Keys. Each key has configurable scopes: read, write, "
198
+ "delete, and admin. Include the token in the Authorization header as "
199
+ "Bearer followed by the token value. Rate limits depend on your plan: "
200
+ "Starter allows 100 requests per minute, Professional allows 1000, and "
201
+ "Enterprise allows 10000. Rate limit headers X-RateLimit-Remaining and "
202
+ "X-RateLimit-Reset are included in every response. Exceeding the limit "
203
+ "returns HTTP 429 with a Retry-After header. API keys can be rotated "
204
+ "without downtime using the key rotation endpoint."
205
+ ),
206
+ "views": 4210,
207
+ "freshness": 0.93,
208
+ },
209
+ {
210
+ "id": "KB-009",
211
+ "title": "API Endpoints for Contact Management",
212
+ "category": "API",
213
+ "content": (
214
+ "Contact CRUD operations are available at the /api/v2/contacts endpoint. "
215
+ "GET returns a paginated list with default page size of 50. Use query "
216
+ "parameters for filtering: status, created_after, tags, and owner_id. "
217
+ "POST creates a new contact with required fields email and name. PUT "
218
+ "updates an existing contact by ID. PATCH allows partial updates. DELETE "
219
+ "moves a contact to trash with 30-day recovery. Bulk operations are "
220
+ "supported via /api/v2/contacts/bulk with a maximum batch size of 1000. "
221
+ "Response format follows JSON API specification with included relationships "
222
+ "for deals, activities, and notes."
223
+ ),
224
+ "views": 3890,
225
+ "freshness": 0.90,
226
+ },
227
+ {
228
+ "id": "KB-010",
229
+ "title": "Webhooks and Event Subscriptions",
230
+ "category": "API",
231
+ "content": (
232
+ "NovaCRM supports webhooks for real-time event notifications. Configure "
233
+ "webhook endpoints at Settings > API > Webhooks. Available events include "
234
+ "contact.created, contact.updated, deal.stage_changed, deal.won, "
235
+ "deal.lost, task.completed, and email.opened. Each webhook delivery "
236
+ "includes an HMAC-SHA256 signature in the X-Webhook-Signature header "
237
+ "for payload verification. Failed deliveries are retried with exponential "
238
+ "backoff up to 5 times over 24 hours. Webhook logs show delivery status, "
239
+ "response codes, and payload details for the last 30 days. Test endpoints "
240
+ "can be configured to receive sample payloads during development."
241
+ ),
242
+ "views": 2156,
243
+ "freshness": 0.87,
244
+ },
245
+ {
246
+ "id": "KB-011",
247
+ "title": "GraphQL API for Advanced Queries",
248
+ "category": "API",
249
+ "content": (
250
+ "NovaCRM provides a GraphQL endpoint at /api/graphql for complex data "
251
+ "queries. The schema supports contacts, deals, activities, teams, and "
252
+ "reports. Introspection is enabled for development environments. Query "
253
+ "depth is limited to 10 levels to prevent abuse. Mutations support "
254
+ "creating, updating, and deleting records with input validation. "
255
+ "Subscriptions are available for real-time updates via WebSocket "
256
+ "connections. The GraphQL playground is accessible at /api/graphql/explore "
257
+ "with auto-complete and documentation. Batch queries are supported with "
258
+ "a maximum of 5 operations per request to maintain performance."
259
+ ),
260
+ "views": 1567,
261
+ "freshness": 0.84,
262
+ },
263
+ # --- Security ---
264
+ {
265
+ "id": "KB-012",
266
+ "title": "Single Sign-On (SSO) Configuration",
267
+ "category": "Security",
268
+ "content": (
269
+ "Enterprise plans support SAML 2.0 and OpenID Connect for SSO. Navigate "
270
+ "to Settings > Security > SSO to configure your identity provider. "
271
+ "Supported providers include Okta, Azure AD, Google Workspace, and "
272
+ "OneLogin. Upload your IdP metadata XML or enter the SSO URL, entity "
273
+ "ID, and X.509 certificate manually. User provisioning can be automated "
274
+ "via SCIM 2.0 for user lifecycle management. JIT (Just-In-Time) "
275
+ "provisioning creates user accounts on first login. SSO enforcement "
276
+ "can be toggled to require all users to authenticate through the IdP, "
277
+ "with bypass codes available for emergency admin access."
278
+ ),
279
+ "views": 1890,
280
+ "freshness": 0.86,
281
+ },
282
+ {
283
+ "id": "KB-013",
284
+ "title": "Two-Factor Authentication Setup",
285
+ "category": "Security",
286
+ "content": (
287
+ "NovaCRM supports TOTP-based two-factor authentication via authenticator "
288
+ "apps such as Google Authenticator, Authy, and Microsoft Authenticator. "
289
+ "Enable 2FA at Profile > Security > Two-Factor Authentication. Scan the "
290
+ "QR code with your authenticator app and enter the verification code to "
291
+ "complete setup. Backup codes are generated during setup for account "
292
+ "recovery. Admins can enforce mandatory 2FA for all users or specific "
293
+ "roles under Settings > Security > Authentication Policies. SMS-based "
294
+ "2FA is available as a fallback option. Hardware security keys following "
295
+ "the FIDO2 WebAuthn standard are supported on Professional and Enterprise "
296
+ "plans."
297
+ ),
298
+ "views": 2567,
299
+ "freshness": 0.92,
300
+ },
301
+ {
302
+ "id": "KB-014",
303
+ "title": "Data Encryption and Privacy Policies",
304
+ "category": "Security",
305
+ "content": (
306
+ "All data in NovaCRM is encrypted at rest using AES-256 and in transit "
307
+ "using TLS 1.3. Database backups are encrypted with separate keys stored "
308
+ "in AWS KMS. Personal data fields support field-level encryption for "
309
+ "enhanced privacy compliance. Data residency options allow choosing "
310
+ "between US, EU, and APAC regions for primary storage. NovaCRM is SOC 2 "
311
+ "Type II certified and GDPR compliant. Data Processing Agreements are "
312
+ "available for Enterprise customers. Right to erasure requests are "
313
+ "processed within 72 hours. Automated data retention policies can be "
314
+ "configured per data type with minimum 30-day and maximum 7-year ranges."
315
+ ),
316
+ "views": 3210,
317
+ "freshness": 0.94,
318
+ },
319
+ {
320
+ "id": "KB-015",
321
+ "title": "Audit Logging and Compliance Reporting",
322
+ "category": "Security",
323
+ "content": (
324
+ "NovaCRM maintains comprehensive audit logs of all user actions, API "
325
+ "calls, and system events. Access audit logs at Settings > Security > "
326
+ "Audit Log. Logs include timestamp, user identity, action performed, "
327
+ "affected resource, IP address, and user agent. Logs are retained for "
328
+ "one year on Professional plans and seven years on Enterprise. Export "
329
+ "audit logs as CSV or JSON for external SIEM integration. Compliance "
330
+ "reports for SOC 2, HIPAA, and GDPR can be generated on demand. "
331
+ "Scheduled compliance reports can be configured to run weekly or monthly "
332
+ "with automatic delivery to designated compliance officers."
333
+ ),
334
+ "views": 1456,
335
+ "freshness": 0.81,
336
+ },
337
+ # --- Integrations ---
338
+ {
339
+ "id": "KB-016",
340
+ "title": "Slack Integration for Team Notifications",
341
+ "category": "Integrations",
342
+ "content": (
343
+ "Connect NovaCRM to Slack for real-time deal and activity notifications. "
344
+ "Navigate to Settings > Integrations > Slack and click Connect. "
345
+ "Authorize NovaCRM to access your Slack workspace. Configure notification "
346
+ "channels for different event types: deal updates, new leads, task "
347
+ "assignments, and system alerts. Use slash commands to query CRM data "
348
+ "directly from Slack: /novacrm search retrieves contacts, /novacrm deal "
349
+ "shows deal details, and /novacrm report generates quick summaries. "
350
+ "Interactive buttons in notifications allow agents to update deal stages, "
351
+ "add notes, and schedule follow-ups without leaving Slack."
352
+ ),
353
+ "views": 2987,
354
+ "freshness": 0.89,
355
+ },
356
+ {
357
+ "id": "KB-017",
358
+ "title": "Email Integration with Gmail and Outlook",
359
+ "category": "Integrations",
360
+ "content": (
361
+ "NovaCRM syncs with Gmail and Outlook for bidirectional email tracking. "
362
+ "Go to Settings > Integrations > Email to connect your account via OAuth. "
363
+ "Incoming emails from known contacts are automatically linked to their "
364
+ "CRM records. Email templates with merge fields can be created and shared "
365
+ "across the team. Tracking pixels detect email opens and link clicks with "
366
+ "timestamps. Scheduled sending allows queuing emails for optimal delivery "
367
+ "times. Email sequences enable automated multi-step outreach campaigns "
368
+ "with configurable delays and exit conditions. Unsubscribe handling "
369
+ "complies with CAN-SPAM and GDPR requirements automatically."
370
+ ),
371
+ "views": 4102,
372
+ "freshness": 0.91,
373
+ },
374
+ {
375
+ "id": "KB-018",
376
+ "title": "Zapier and Make Integration Hub",
377
+ "category": "Integrations",
378
+ "content": (
379
+ "NovaCRM integrates with over 3000 applications through Zapier and Make "
380
+ "connectors. Common automation recipes include syncing new contacts to "
381
+ "email marketing platforms, creating support tickets from deal notes, "
382
+ "and updating accounting software when deals close. The NovaCRM Zapier "
383
+ "app supports triggers for contact events, deal changes, and form "
384
+ "submissions. Actions include creating contacts, updating deals, and "
385
+ "adding notes. Multi-step Zaps enable complex workflows spanning "
386
+ "multiple applications. Make scenarios support parallel branches for "
387
+ "simultaneous actions across different services."
388
+ ),
389
+ "views": 1789,
390
+ "freshness": 0.83,
391
+ },
392
+ {
393
+ "id": "KB-019",
394
+ "title": "Calendar Sync with Google and Microsoft",
395
+ "category": "Integrations",
396
+ "content": (
397
+ "Synchronize your calendar with NovaCRM for seamless meeting management. "
398
+ "Connect Google Calendar or Microsoft Outlook Calendar at Settings > "
399
+ "Integrations > Calendar. Two-way sync ensures meetings created in either "
400
+ "platform appear in both. Meeting links from Zoom, Teams, and Google "
401
+ "Meet are automatically detected and added to CRM activities. The booking "
402
+ "page feature generates shareable scheduling links with configurable "
403
+ "availability windows, buffer times, and round-robin assignment for teams. "
404
+ "Meeting outcomes can be logged directly from calendar events with "
405
+ "predefined disposition codes and next-step actions."
406
+ ),
407
+ "views": 2345,
408
+ "freshness": 0.88,
409
+ },
410
+ # --- Infrastructure ---
411
+ {
412
+ "id": "KB-020",
413
+ "title": "System Architecture and Performance",
414
+ "category": "Infrastructure",
415
+ "content": (
416
+ "NovaCRM runs on a microservices architecture deployed on AWS. The API "
417
+ "layer uses load-balanced application servers behind CloudFront CDN. "
418
+ "PostgreSQL with read replicas handles primary data storage. Redis "
419
+ "provides caching and session management. Elasticsearch powers full-text "
420
+ "search across contacts, deals, and communications. Background job "
421
+ "processing uses a distributed task queue for email sending, report "
422
+ "generation, and data imports. The platform maintains 99.9 percent uptime "
423
+ "SLA with automated failover across availability zones. Response times "
424
+ "average under 200 milliseconds for API calls."
425
+ ),
426
+ "views": 987,
427
+ "freshness": 0.76,
428
+ },
429
+ {
430
+ "id": "KB-021",
431
+ "title": "Backup and Disaster Recovery Procedures",
432
+ "category": "Infrastructure",
433
+ "content": (
434
+ "NovaCRM performs automated database backups every 6 hours with point-in-time "
435
+ "recovery capability for the last 35 days. Backups are stored in a separate "
436
+ "AWS region from production data. Full disaster recovery tests are conducted "
437
+ "quarterly with documented recovery time objectives of 4 hours and recovery "
438
+ "point objectives of 1 hour. Customer data exports can be scheduled daily "
439
+ "or weekly via Settings > Data > Automated Exports in CSV or JSON format. "
440
+ "Enterprise customers can configure custom backup schedules and retention "
441
+ "policies. Backup verification runs automated integrity checks after each "
442
+ "snapshot to ensure recoverability."
443
+ ),
444
+ "views": 654,
445
+ "freshness": 0.72,
446
+ },
447
+ {
448
+ "id": "KB-022",
449
+ "title": "Status Page and Incident Response",
450
+ "category": "Infrastructure",
451
+ "content": (
452
+ "Monitor NovaCRM system status at status.novacrm.com. The status page "
453
+ "shows real-time availability for all services: API, web application, "
454
+ "email delivery, webhook processing, and integrations. Subscribe to "
455
+ "status updates via email, SMS, or RSS. During incidents, updates are "
456
+ "posted every 15 minutes until resolution. Post-incident reports are "
457
+ "published within 48 hours with root cause analysis and preventive "
458
+ "measures. Scheduled maintenance windows are announced 72 hours in "
459
+ "advance. Enterprise customers receive priority notification through "
460
+ "a dedicated Slack channel and direct account manager communication."
461
+ ),
462
+ "views": 1123,
463
+ "freshness": 0.80,
464
+ },
465
+ # --- Support ---
466
+ {
467
+ "id": "KB-023",
468
+ "title": "Contacting Support and SLA Details",
469
+ "category": "Support",
470
+ "content": (
471
+ "NovaCRM support is available through multiple channels. Starter plans "
472
+ "include email support with 24-hour response time during business hours. "
473
+ "Professional plans add live chat with 4-hour response time and phone "
474
+ "support during extended hours. Enterprise plans include a dedicated "
475
+ "account manager, 1-hour response time for critical issues, and 24/7 "
476
+ "phone support. Submit tickets at support.novacrm.com or via the in-app "
477
+ "help widget. Priority levels range from P1 for system-wide outages to "
478
+ "P4 for feature requests. Escalation procedures are documented in the "
479
+ "support portal with clear timelines for each priority level."
480
+ ),
481
+ "views": 3456,
482
+ "freshness": 0.90,
483
+ },
484
+ {
485
+ "id": "KB-024",
486
+ "title": "Troubleshooting Common Login Issues",
487
+ "category": "Support",
488
+ "content": (
489
+ "Common login problems include forgotten passwords, expired sessions, "
490
+ "and browser compatibility issues. To reset your password, click Forgot "
491
+ "Password on the login page and enter your registered email. Reset links "
492
+ "expire after 24 hours. If your account is locked after 5 failed attempts, "
493
+ "wait 30 minutes or contact support for immediate unlock. Clear browser "
494
+ "cache and cookies if you experience persistent session errors. NovaCRM "
495
+ "supports Chrome, Firefox, Safari, and Edge in their last two major "
496
+ "versions. Disable browser extensions if you encounter rendering issues. "
497
+ "For SSO login problems, verify your IdP configuration and check the "
498
+ "SSO debug log at Settings > Security > SSO > Debug."
499
+ ),
500
+ "views": 5678,
501
+ "freshness": 0.93,
502
+ },
503
+ {
504
+ "id": "KB-025",
505
+ "title": "Feature Request and Feedback Process",
506
+ "category": "Support",
507
+ "content": (
508
+ "Submit feature requests through the NovaCRM feedback portal at "
509
+ "feedback.novacrm.com. Each request can be voted on by other users to "
510
+ "help prioritize development. The product team reviews submissions "
511
+ "monthly and updates status to Under Review, Planned, In Development, "
512
+ "or Released. Public roadmap visibility is available for Professional "
513
+ "and Enterprise plans. Beta features can be enabled per account at "
514
+ "Settings > Labs. Beta participants provide feedback through in-app "
515
+ "surveys and dedicated Slack channels. Feature request history and "
516
+ "status tracking are available in your account dashboard."
517
+ ),
518
+ "views": 890,
519
+ "freshness": 0.77,
520
+ },
521
+ # --- Compliance ---
522
+ {
523
+ "id": "KB-026",
524
+ "title": "GDPR Compliance and Data Subject Requests",
525
+ "category": "Compliance",
526
+ "content": (
527
+ "NovaCRM provides built-in tools for GDPR compliance. The Data Subject "
528
+ "Request portal at Settings > Compliance > DSR handles right of access, "
529
+ "right to rectification, right to erasure, and data portability requests. "
530
+ "Automated workflows process erasure requests within 72 hours, removing "
531
+ "personal data from all systems including backups within 30 days. Consent "
532
+ "management tracks legal basis for data processing per contact. Data "
533
+ "processing records are maintained automatically. Cookie consent banners "
534
+ "are configurable for customer-facing forms. Annual GDPR compliance "
535
+ "assessments are available for Enterprise customers with documentation "
536
+ "support for supervisory authority inquiries."
537
+ ),
538
+ "views": 2134,
539
+ "freshness": 0.89,
540
+ },
541
+ {
542
+ "id": "KB-027",
543
+ "title": "HIPAA Compliance for Healthcare Customers",
544
+ "category": "Compliance",
545
+ "content": (
546
+ "NovaCRM Enterprise supports HIPAA compliance for healthcare organizations. "
547
+ "A Business Associate Agreement is available upon request. HIPAA-compliant "
548
+ "configurations include field-level encryption for Protected Health "
549
+ "Information, access controls with minimum necessary permissions, and "
550
+ "enhanced audit logging for PHI access. Automatic session timeout after "
551
+ "15 minutes of inactivity is enforced. PHI data is stored in dedicated "
552
+ "encrypted partitions with separate key management. Employee training "
553
+ "records for HIPAA awareness are trackable within the compliance module. "
554
+ "Breach notification workflows automate the required 60-day reporting "
555
+ "timeline with documentation templates."
556
+ ),
557
+ "views": 876,
558
+ "freshness": 0.75,
559
+ },
560
+ # --- Mixed / Advanced ---
561
+ {
562
+ "id": "KB-028",
563
+ "title": "Workflow Automation Rules Engine",
564
+ "category": "Integrations",
565
+ "content": (
566
+ "The NovaCRM rules engine enables no-code workflow automation. Create "
567
+ "rules at Automations > Rules with trigger-condition-action logic. "
568
+ "Triggers include record creation, field changes, time-based schedules, "
569
+ "and webhook events. Conditions support AND/OR logic with field "
570
+ "comparisons, formula evaluation, and related record checks. Actions "
571
+ "include sending emails, creating tasks, updating fields, sending "
572
+ "notifications, and calling external webhooks. Rules execute in real-time "
573
+ "with a maximum chain depth of 5 to prevent infinite loops. Execution "
574
+ "logs track every rule firing with input data, conditions evaluated, "
575
+ "and actions performed for debugging and audit purposes."
576
+ ),
577
+ "views": 3890,
578
+ "freshness": 0.92,
579
+ },
580
+ {
581
+ "id": "KB-029",
582
+ "title": "Custom Reporting and Dashboard Builder",
583
+ "category": "Infrastructure",
584
+ "content": (
585
+ "Build custom reports and dashboards with the NovaCRM report builder. "
586
+ "Navigate to Analytics > Reports > New Report to start. Choose from "
587
+ "report types: tabular, summary, matrix, and chart. Data sources include "
588
+ "contacts, deals, activities, emails, and custom objects. Apply filters, "
589
+ "groupings, and calculated fields using formula syntax. Schedule reports "
590
+ "for automatic delivery via email in PDF or Excel format. Dashboards "
591
+ "support drag-and-drop widget placement with resizable components. "
592
+ "Available widgets include metric cards, bar charts, line graphs, pie "
593
+ "charts, funnels, and data tables. Share dashboards with teams or "
594
+ "specific users with view or edit permissions."
595
+ ),
596
+ "views": 2678,
597
+ "freshness": 0.87,
598
+ },
599
+ {
600
+ "id": "KB-030",
601
+ "title": "Mobile App Features and Offline Mode",
602
+ "category": "Support",
603
+ "content": (
604
+ "The NovaCRM mobile app is available for iOS and Android. Download from "
605
+ "the App Store or Google Play Store. The mobile app supports contact "
606
+ "management, deal updates, task management, and activity logging. Push "
607
+ "notifications alert you to new leads, deal changes, and task deadlines. "
608
+ "Offline mode caches your most recent 500 contacts and 100 deals for "
609
+ "access without internet connectivity. Changes made offline sync "
610
+ "automatically when connection is restored with conflict resolution for "
611
+ "simultaneous edits. Business card scanning uses OCR to create contacts "
612
+ "from photos. Voice notes can be attached to any record and are "
613
+ "automatically transcribed using speech recognition."
614
+ ),
615
+ "views": 3210,
616
+ "freshness": 0.85,
617
+ },
618
+ ]
619
+
620
+
621
+ # ---------------------------------------------------------------------------
622
+ # TF-IDF Engine -- implemented from scratch
623
+ # ---------------------------------------------------------------------------
624
+
625
+
626
+ def tokenize(text: str) -> list[str]:
627
+ """Lowercase, strip punctuation, split into tokens, remove stop words."""
628
+ text = text.lower()
629
+ text = text.translate(str.maketrans("", "", string.punctuation))
630
+ tokens = text.split()
631
+ return [t for t in tokens if t not in STOP_WORDS and len(t) > 1]
632
+
633
+
634
+ def compute_term_frequency(tokens: list[str]) -> dict[str, float]:
635
+ """Compute augmented term frequency: 0.5 + 0.5 * (count / max_count).
636
+
637
+ Augmented TF prevents bias toward longer documents.
638
+ """
639
+ counts = Counter(tokens)
640
+ if not counts:
641
+ return {}
642
+ max_count = max(counts.values())
643
+ return {
644
+ term: 0.5 + 0.5 * (count / max_count)
645
+ for term, count in counts.items()
646
+ }
647
+
648
+
649
+ def compute_idf(corpus_tokens: list[list[str]], vocabulary: list[str]) -> dict[str, float]:
650
+ """Compute inverse document frequency: log(N / (1 + df)).
651
+
652
+ Uses smoothed IDF to avoid division by zero for terms not in any document.
653
+ """
654
+ num_documents = len(corpus_tokens)
655
+ idf_values: dict[str, float] = {}
656
+ for term in vocabulary:
657
+ document_frequency = sum(
658
+ 1 for doc_tokens in corpus_tokens if term in set(doc_tokens)
659
+ )
660
+ idf_values[term] = math.log(num_documents / (1 + document_frequency))
661
+ return idf_values
662
+
663
+
664
+ def build_tfidf_matrix(
665
+ corpus_tokens: list[list[str]],
666
+ vocabulary: list[str],
667
+ idf_values: dict[str, float],
668
+ ) -> np.ndarray:
669
+ """Build a TF-IDF matrix of shape (num_documents, vocab_size)."""
670
+ vocab_index = {term: idx for idx, term in enumerate(vocabulary)}
671
+ matrix = np.zeros((len(corpus_tokens), len(vocabulary)), dtype=np.float64)
672
+
673
+ for doc_idx, tokens in enumerate(corpus_tokens):
674
+ tf_values = compute_term_frequency(tokens)
675
+ for term, tf_score in tf_values.items():
676
+ if term in vocab_index:
677
+ col_idx = vocab_index[term]
678
+ matrix[doc_idx, col_idx] = tf_score * idf_values[term]
679
+
680
+ return matrix
681
+
682
+
683
+ def cosine_similarity_vector(matrix: np.ndarray, query_vector: np.ndarray) -> np.ndarray:
684
+ """Compute cosine similarity between each row of matrix and query_vector."""
685
+ dot_products = matrix @ query_vector
686
+ matrix_norms = np.linalg.norm(matrix, axis=1)
687
+ query_norm = np.linalg.norm(query_vector)
688
+
689
+ denominator = matrix_norms * query_norm
690
+ # Avoid division by zero for zero-norm vectors
691
+ denominator = np.where(denominator == 0, 1.0, denominator)
692
+ return dot_products / denominator
693
+
694
+
695
+ class TFIDFSearchEngine:
696
+ """TF-IDF search engine with cosine similarity ranking."""
697
+
698
+ def __init__(self, articles: list[dict[str, str]]) -> None:
699
+ self.articles = articles
700
+ self._corpus_tokens: list[list[str]] = []
701
+ self._vocabulary: list[str] = []
702
+ self._idf: dict[str, float] = {}
703
+ self._tfidf_matrix: np.ndarray = np.array([])
704
+ self._vocab_index: dict[str, int] = {}
705
+ self._build_index()
706
+
707
+ def _build_index(self) -> None:
708
+ """Tokenize all articles and precompute the TF-IDF matrix."""
709
+ self._corpus_tokens = [
710
+ tokenize(article["title"] + " " + article["content"])
711
+ for article in self.articles
712
+ ]
713
+
714
+ vocab_set: set[str] = set()
715
+ for tokens in self._corpus_tokens:
716
+ vocab_set.update(tokens)
717
+ self._vocabulary = sorted(vocab_set)
718
+ self._vocab_index = {term: idx for idx, term in enumerate(self._vocabulary)}
719
+
720
+ self._idf = compute_idf(self._corpus_tokens, self._vocabulary)
721
+ self._tfidf_matrix = build_tfidf_matrix(
722
+ self._corpus_tokens, self._vocabulary, self._idf
723
+ )
724
+
725
+ def search(self, query: str, top_k: int = TOP_K_RESULTS) -> list[dict]:
726
+ """Search the corpus and return top_k results with scores and matched terms."""
727
+ query_tokens = tokenize(query)
728
+ if not query_tokens:
729
+ return []
730
+
731
+ query_tf = compute_term_frequency(query_tokens)
732
+ query_vector = np.zeros(len(self._vocabulary), dtype=np.float64)
733
+ for term, tf_score in query_tf.items():
734
+ if term in self._vocab_index:
735
+ col_idx = self._vocab_index[term]
736
+ query_vector[col_idx] = tf_score * self._idf.get(term, 0.0)
737
+
738
+ if np.linalg.norm(query_vector) == 0:
739
+ return []
740
+
741
+ similarities = cosine_similarity_vector(self._tfidf_matrix, query_vector)
742
+ top_indices = np.argsort(similarities)[::-1][:top_k]
743
+
744
+ results = []
745
+ query_term_set = set(query_tokens)
746
+ for idx in top_indices:
747
+ score = float(similarities[idx])
748
+ if score <= 0:
749
+ continue
750
+ article = self.articles[idx]
751
+ doc_term_set = set(self._corpus_tokens[idx])
752
+ matched_terms = sorted(query_term_set & doc_term_set)
753
+ results.append({
754
+ "article": article,
755
+ "score": score,
756
+ "matched_terms": matched_terms,
757
+ })
758
+
759
+ return results
760
+
761
+ def get_best_match(self, query: str) -> Optional[dict]:
762
+ """Return the single best matching article, or None."""
763
+ results = self.search(query, top_k=1)
764
+ return results[0] if results else None
765
+
766
+
767
+ # ---------------------------------------------------------------------------
768
+ # Initialize the search engine (module-level singleton)
769
+ # ---------------------------------------------------------------------------
770
+
771
+ search_engine = TFIDFSearchEngine(KNOWLEDGE_BASE)
772
+
773
+
774
+ # ---------------------------------------------------------------------------
775
+ # Tab 1: Knowledge Search
776
+ # ---------------------------------------------------------------------------
777
+
778
+
779
+ def _highlight_terms(text: str, terms: list[str]) -> str:
780
+ """Wrap matched terms in bold markdown markers."""
781
+ highlighted = text
782
+ for term in terms:
783
+ pattern = re.compile(re.escape(term), re.IGNORECASE)
784
+ highlighted = pattern.sub(f"**{term}**", highlighted)
785
+ return highlighted
786
+
787
+
788
+ def perform_search(query: str) -> str:
789
+ """Execute TF-IDF search and format results as markdown."""
790
+ if not query or not query.strip():
791
+ return "*Enter a search query to find relevant knowledge base articles.*"
792
+
793
+ results = search_engine.search(query.strip(), top_k=TOP_K_RESULTS)
794
+
795
+ if not results:
796
+ return (
797
+ f"**No results found for:** \"{query}\"\n\n"
798
+ "No articles in the knowledge base matched your query terms. "
799
+ "Try using different keywords or broader terms."
800
+ )
801
+
802
+ output_parts = [
803
+ f"### Search Results for: \"{query}\"\n",
804
+ f"Found **{len(results)}** relevant article(s).\n",
805
+ "---\n",
806
+ ]
807
+
808
+ for rank, result in enumerate(results, start=1):
809
+ article = result["article"]
810
+ score = result["score"]
811
+ matched = result["matched_terms"]
812
+ score_bar = _render_score_bar(score)
813
+ highlighted_content = _highlight_terms(article["content"], matched)
814
+ matched_display = ", ".join(f"`{t}`" for t in matched) if matched else "N/A"
815
+
816
+ output_parts.append(
817
+ f"**#{rank} [{article['id']}] {article['title']}**\n"
818
+ f"Category: {article['category']} | "
819
+ f"Relevance: {score:.4f} {score_bar}\n"
820
+ f"Matched terms: {matched_display}\n\n"
821
+ f"{highlighted_content}\n\n"
822
+ "---\n"
823
+ )
824
+
825
+ return "\n".join(output_parts)
826
+
827
+
828
+ def _render_score_bar(score: float) -> str:
829
+ """Render a text-based relevance bar using block characters."""
830
+ filled = int(round(score * 20))
831
+ filled = min(filled, 20)
832
+ return "[" + "=" * filled + " " * (20 - filled) + "]"
833
+
834
+
835
+ # ---------------------------------------------------------------------------
836
+ # Tab 2: AI Q&A
837
+ # ---------------------------------------------------------------------------
838
+
839
+
840
+ def answer_question(question: str) -> str:
841
+ """Find the most relevant article and generate a template-based answer with citation."""
842
+ if not question or not question.strip():
843
+ return "*Ask a question about NovaCRM to get an answer with source citation.*"
844
+
845
+ result = search_engine.get_best_match(question.strip())
846
+
847
+ if result is None:
848
+ return (
849
+ f"**Question:** {question}\n\n"
850
+ "I could not find a relevant article in the knowledge base to answer "
851
+ "your question. Try rephrasing with more specific terms related to "
852
+ "NovaCRM features, billing, API, security, or integrations."
853
+ )
854
+
855
+ article = result["article"]
856
+ score = result["score"]
857
+ matched = result["matched_terms"]
858
+
859
+ # Extract the most relevant sentence(s) from the article as the excerpt
860
+ excerpt = _extract_relevant_excerpt(article["content"], matched)
861
+ highlighted_excerpt = _highlight_terms(excerpt, matched)
862
+
863
+ answer_text = _generate_template_answer(question, article, matched)
864
+
865
+ output_parts = [
866
+ f"**Question:** {question}\n\n",
867
+ "---\n\n",
868
+ f"### Answer\n\n{answer_text}\n\n",
869
+ "---\n\n",
870
+ f"### Source\n\n",
871
+ f"**Article:** [{article['id']}] {article['title']}\n\n",
872
+ f"**Category:** {article['category']}\n\n",
873
+ f"**Confidence:** {score:.4f}\n\n",
874
+ f"**Relevant Excerpt:**\n\n> {highlighted_excerpt}\n",
875
+ ]
876
+
877
+ return "".join(output_parts)
878
+
879
+
880
+ def _extract_relevant_excerpt(content: str, matched_terms: list[str]) -> str:
881
+ """Extract the most relevant 1-2 sentences from the article content."""
882
+ sentences = re.split(r"(?<=[.!?])\s+", content)
883
+ if not sentences:
884
+ return content[:300]
885
+
886
+ if not matched_terms:
887
+ return sentences[0]
888
+
889
+ scored_sentences: list[tuple[int, str]] = []
890
+ for sentence in sentences:
891
+ sentence_lower = sentence.lower()
892
+ match_count = sum(1 for term in matched_terms if term in sentence_lower)
893
+ scored_sentences.append((match_count, sentence))
894
+
895
+ scored_sentences.sort(key=lambda pair: pair[0], reverse=True)
896
+
897
+ # Take the top 2 sentences by match count
898
+ top_sentences = scored_sentences[:2]
899
+ # Re-order by original position in the article
900
+ top_sentences_text = [s[1] for s in top_sentences]
901
+ ordered = [s for s in sentences if s in top_sentences_text]
902
+
903
+ return " ".join(ordered) if ordered else sentences[0]
904
+
905
+
906
+ def _generate_template_answer(
907
+ question: str, article: dict[str, str], matched_terms: list[str]
908
+ ) -> str:
909
+ """Generate a natural-language answer based on the matched article content.
910
+
911
+ Uses the article content to compose a direct response rather than
912
+ simply echoing the question back.
913
+ """
914
+ category = article["category"]
915
+ title = article["title"]
916
+ content = article["content"]
917
+
918
+ # Extract key sentences that address the question
919
+ sentences = re.split(r"(?<=[.!?])\s+", content)
920
+ relevant_sentences = []
921
+ for sentence in sentences:
922
+ sentence_lower = sentence.lower()
923
+ if any(term in sentence_lower for term in matched_terms):
924
+ relevant_sentences.append(sentence)
925
+
926
+ if not relevant_sentences:
927
+ relevant_sentences = sentences[:3]
928
+
929
+ # Construct the answer
930
+ answer_body = " ".join(relevant_sentences[:4])
931
+
932
+ intro_templates = {
933
+ "Onboarding": f"Based on the {title} documentation",
934
+ "Billing": f"According to the billing documentation on {title}",
935
+ "API": f"The API documentation ({title}) explains",
936
+ "Security": f"Per the security documentation in {title}",
937
+ "Integrations": f"The integration guide for {title} states",
938
+ "Infrastructure": f"According to the infrastructure documentation ({title})",
939
+ "Support": f"The support documentation ({title}) addresses this",
940
+ "Compliance": f"Per the compliance documentation in {title}",
941
+ }
942
+
943
+ intro = intro_templates.get(category, f"According to {title}")
944
+
945
+ return f"{intro}: {answer_body}"
946
+
947
+
948
+ # ---------------------------------------------------------------------------
949
+ # Tab 3: Training Generator
950
+ # ---------------------------------------------------------------------------
951
+
952
+ # Pre-built quiz data keyed by article ID
953
+ QUIZ_DATA: dict[str, list[dict]] = {}
954
+
955
+
956
+ def _build_quiz_for_article(article: dict[str, str]) -> list[dict]:
957
+ """Generate 5 multiple-choice questions from article content.
958
+
959
+ Uses content extraction to create questions that reference actual
960
+ article details rather than generic filler.
961
+ """
962
+ content = article["content"]
963
+ title = article["title"]
964
+ sentences = re.split(r"(?<=[.!?])\s+", content)
965
+
966
+ questions: list[dict] = []
967
+
968
+ # Strategy: pull factual statements and create questions about them
969
+ for i, sentence in enumerate(sentences):
970
+ if len(questions) >= 5:
971
+ break
972
+ # Skip very short sentences
973
+ if len(sentence) < 30:
974
+ continue
975
+ question_entry = _sentence_to_question(sentence, title, i)
976
+ if question_entry:
977
+ questions.append(question_entry)
978
+
979
+ # Pad with generic questions if content was too sparse
980
+ while len(questions) < 5:
981
+ questions.append({
982
+ "question": f"What is the primary purpose of {title}?",
983
+ "options": [
984
+ f"To manage {article['category'].lower()} features",
985
+ "To provide general system information",
986
+ "To configure external services",
987
+ "To handle user authentication only",
988
+ ],
989
+ "correct": 0,
990
+ })
991
+
992
+ return questions[:5]
993
+
994
+
995
+ def _sentence_to_question(sentence: str, title: str, seed: int) -> Optional[dict]:
996
+ """Convert a factual sentence into a multiple-choice question."""
997
+ # Look for sentences with numbers, specific features, or named items
998
+ number_match = re.search(r"(\d+[\s\w-]*(?:hours?|days?|minutes?|percent|GB|requests?))", sentence)
999
+ if number_match:
1000
+ fact = number_match.group(1)
1001
+ return {
1002
+ "question": f"According to \"{title}\", what is the specification for: {fact.strip()}?",
1003
+ "options": [
1004
+ f"The value is {fact.strip()}",
1005
+ f"The value is double the standard amount",
1006
+ "This is not specified in the documentation",
1007
+ "This depends on the subscription tier selected",
1008
+ ],
1009
+ "correct": 0,
1010
+ }
1011
+
1012
+ # Look for feature mentions
1013
+ feature_patterns = [
1014
+ (r"supports?\s+(.+?)(?:\.|,|$)", "support"),
1015
+ (r"includes?\s+(.+?)(?:\.|,|$)", "include"),
1016
+ (r"provides?\s+(.+?)(?:\.|,|$)", "provide"),
1017
+ (r"enables?\s+(.+?)(?:\.|,|$)", "enable"),
1018
+ ]
1019
+
1020
+ for pattern, verb in feature_patterns:
1021
+ match = re.search(pattern, sentence, re.IGNORECASE)
1022
+ if match:
1023
+ feature = match.group(1).strip()
1024
+ if len(feature) > 15 and len(feature) < 120:
1025
+ return {
1026
+ "question": f"What does the system {verb} according to \"{title}\"?",
1027
+ "options": [
1028
+ feature[:100],
1029
+ "Only basic text-based functionality",
1030
+ "This feature is not available",
1031
+ "Requires third-party configuration",
1032
+ ],
1033
+ "correct": 0,
1034
+ }
1035
+
1036
+ return None
1037
+
1038
+
1039
+ def generate_training(topic_title: str) -> str:
1040
+ """Generate a training article outline and quiz for the selected topic."""
1041
+ if not topic_title:
1042
+ return "*Select a topic to generate training material.*"
1043
+
1044
+ article = None
1045
+ for kb_article in KNOWLEDGE_BASE:
1046
+ if kb_article["title"] == topic_title:
1047
+ article = kb_article
1048
+ break
1049
+
1050
+ if article is None:
1051
+ return f"Article not found: {topic_title}"
1052
+
1053
+ # Cache quiz data
1054
+ if article["id"] not in QUIZ_DATA:
1055
+ QUIZ_DATA[article["id"]] = _build_quiz_for_article(article)
1056
+
1057
+ quiz_questions = QUIZ_DATA[article["id"]]
1058
+ sentences = re.split(r"(?<=[.!?])\s+", article["content"])
1059
+
1060
+ # Build training article outline
1061
+ output_parts = [
1062
+ f"## Training Module: {article['title']}\n",
1063
+ f"**Category:** {article['category']} | "
1064
+ f"**Article ID:** {article['id']}\n\n",
1065
+ "---\n\n",
1066
+ "### Learning Objectives\n\n",
1067
+ f"After completing this module, you will be able to:\n\n",
1068
+ ]
1069
+
1070
+ # Generate 3 learning objectives from article content
1071
+ objectives = _extract_learning_objectives(sentences)
1072
+ for obj in objectives:
1073
+ output_parts.append(f"- {obj}\n")
1074
+
1075
+ output_parts.append("\n### Module Outline\n\n")
1076
+
1077
+ # Split content into sections
1078
+ section_size = max(1, len(sentences) // 3)
1079
+ section_titles = ["Introduction and Overview", "Core Concepts", "Implementation Details"]
1080
+ for section_idx, section_title in enumerate(section_titles):
1081
+ start = section_idx * section_size
1082
+ end = start + section_size if section_idx < 2 else len(sentences)
1083
+ section_content = " ".join(sentences[start:end])
1084
+ if section_content.strip():
1085
+ output_parts.append(f"**{section_idx + 1}. {section_title}**\n\n")
1086
+ output_parts.append(f"{section_content}\n\n")
1087
+
1088
+ output_parts.append("---\n\n### Knowledge Check (5 Questions)\n\n")
1089
+
1090
+ for q_idx, q_data in enumerate(quiz_questions, start=1):
1091
+ output_parts.append(f"**Q{q_idx}.** {q_data['question']}\n\n")
1092
+ labels = ["A", "B", "C", "D"]
1093
+ for opt_idx, option in enumerate(q_data["options"]):
1094
+ marker = " (correct)" if opt_idx == q_data["correct"] else ""
1095
+ output_parts.append(f" {labels[opt_idx]}. {option}{marker}\n")
1096
+ output_parts.append("\n")
1097
+
1098
+ return "".join(output_parts)
1099
+
1100
+
1101
+ def _extract_learning_objectives(sentences: list[str]) -> list[str]:
1102
+ """Extract or generate 3 learning objectives from article sentences."""
1103
+ objectives: list[str] = []
1104
+
1105
+ action_verbs = [
1106
+ "Understand how to", "Explain the process of", "Configure and manage",
1107
+ "Identify the key aspects of", "Apply knowledge about",
1108
+ ]
1109
+
1110
+ for sentence in sentences:
1111
+ if len(objectives) >= 3:
1112
+ break
1113
+ # Look for sentences describing capabilities or processes
1114
+ if any(kw in sentence.lower() for kw in ["navigate", "configure", "create", "enable", "support"]):
1115
+ # Rephrase as objective
1116
+ clean = sentence.rstrip(".")
1117
+ verb = action_verbs[len(objectives) % len(action_verbs)]
1118
+ objective = f"{verb} {clean[0].lower()}{clean[1:]}"
1119
+ if len(objective) < 200:
1120
+ objectives.append(objective)
1121
+
1122
+ # Pad if needed
1123
+ while len(objectives) < 3:
1124
+ objectives.append(
1125
+ f"{action_verbs[len(objectives) % len(action_verbs)]} the features described in this module"
1126
+ )
1127
+
1128
+ return objectives[:3]
1129
+
1130
+
1131
+ # ---------------------------------------------------------------------------
1132
+ # Tab 4: Knowledge Gap Analytics
1133
+ # ---------------------------------------------------------------------------
1134
+
1135
+ # Mock analytics data
1136
+ UNANSWERED_QUERIES = [
1137
+ "How do I integrate with Salesforce?",
1138
+ "What is the data export format for compliance audits?",
1139
+ "Can I use NovaCRM with a self-hosted email server?",
1140
+ "How to configure IP allowlisting?",
1141
+ "What are the API rate limits for the GraphQL endpoint specifically?",
1142
+ "Does NovaCRM support multi-currency deals?",
1143
+ "How to set up automated lead scoring?",
1144
+ "Can I restrict API access by IP address?",
1145
+ "What is the maximum file attachment size?",
1146
+ "How to configure custom email domains?",
1147
+ ]
1148
+
1149
+ SEARCH_QUERIES_LOG = [
1150
+ ("api authentication", 342),
1151
+ ("billing invoice", 287),
1152
+ ("sso setup okta", 198),
1153
+ ("import contacts csv", 176),
1154
+ ("webhook configuration", 154),
1155
+ ("slack integration", 143),
1156
+ ("password reset", 312),
1157
+ ("pipeline stages", 131),
1158
+ ("gdpr data deletion", 119),
1159
+ ("mobile app offline", 108),
1160
+ ("two factor authentication", 205),
1161
+ ("email tracking", 167),
1162
+ ("custom reports", 145),
1163
+ ("zapier automation", 98),
1164
+ ("backup schedule", 87),
1165
+ ]
1166
+
1167
+
1168
+ def generate_analytics() -> tuple:
1169
+ """Generate all analytics charts and summary text.
1170
+
1171
+ Returns a tuple of (summary_markdown, articles_by_category_fig,
1172
+ freshness_fig, views_fig, gaps_fig).
1173
+ """
1174
+ summary = _build_analytics_summary()
1175
+ category_fig = _plot_articles_by_category()
1176
+ freshness_fig = _plot_freshness_scores()
1177
+ views_fig = _plot_article_views()
1178
+ gaps_fig = _plot_search_gaps()
1179
+
1180
+ return summary, category_fig, freshness_fig, views_fig, gaps_fig
1181
+
1182
+
1183
+ def _build_analytics_summary() -> str:
1184
+ """Build the text summary of knowledge base health."""
1185
+ total_articles = len(KNOWLEDGE_BASE)
1186
+ total_views = sum(a["views"] for a in KNOWLEDGE_BASE)
1187
+ avg_freshness = sum(a["freshness"] for a in KNOWLEDGE_BASE) / total_articles
1188
+ stale_articles = [a for a in KNOWLEDGE_BASE if a["freshness"] < 0.80]
1189
+ categories_covered = len(set(a["category"] for a in KNOWLEDGE_BASE))
1190
+
1191
+ # Most and least viewed
1192
+ sorted_by_views = sorted(KNOWLEDGE_BASE, key=lambda a: a["views"], reverse=True)
1193
+ most_viewed = sorted_by_views[0]
1194
+ least_viewed = sorted_by_views[-1]
1195
+
1196
+ return (
1197
+ "### Knowledge Base Health Summary\n\n"
1198
+ f"| Metric | Value |\n"
1199
+ f"|--------|-------|\n"
1200
+ f"| Total articles | {total_articles} |\n"
1201
+ f"| Categories covered | {categories_covered} |\n"
1202
+ f"| Total page views | {total_views:,} |\n"
1203
+ f"| Average freshness score | {avg_freshness:.2f} |\n"
1204
+ f"| Articles needing update (freshness < 0.80) | {len(stale_articles)} |\n"
1205
+ f"| Unanswered search queries | {len(UNANSWERED_QUERIES)} |\n\n"
1206
+ f"**Most viewed:** [{most_viewed['id']}] {most_viewed['title']} "
1207
+ f"({most_viewed['views']:,} views)\n\n"
1208
+ f"**Least viewed:** [{least_viewed['id']}] {least_viewed['title']} "
1209
+ f"({least_viewed['views']:,} views)\n\n"
1210
+ "**Stale articles requiring review:**\n\n"
1211
+ + "\n".join(
1212
+ f"- [{a['id']}] {a['title']} (freshness: {a['freshness']:.2f})"
1213
+ for a in stale_articles
1214
+ )
1215
+ )
1216
+
1217
+
1218
+ def _apply_dark_style(fig: plt.Figure, ax: plt.Axes) -> None:
1219
+ """Apply consistent dark theme styling to matplotlib figures."""
1220
+ bg_color = "#1a1a2e"
1221
+ text_color = "#e0e0e0"
1222
+ grid_color = "#2a2a4a"
1223
+
1224
+ fig.patch.set_facecolor(bg_color)
1225
+ ax.set_facecolor(bg_color)
1226
+ ax.tick_params(colors=text_color, which="both")
1227
+ ax.xaxis.label.set_color(text_color)
1228
+ ax.yaxis.label.set_color(text_color)
1229
+ ax.title.set_color(text_color)
1230
+
1231
+ for spine in ax.spines.values():
1232
+ spine.set_color(grid_color)
1233
+
1234
+ ax.grid(True, alpha=0.2, color=grid_color)
1235
+
1236
+
1237
+ def _plot_articles_by_category() -> plt.Figure:
1238
+ """Bar chart of article count per category."""
1239
+ category_counts: dict[str, int] = {}
1240
+ for article in KNOWLEDGE_BASE:
1241
+ cat = article["category"]
1242
+ category_counts[cat] = category_counts.get(cat, 0) + 1
1243
+
1244
+ categories = sorted(category_counts.keys())
1245
+ counts = [category_counts[c] for c in categories]
1246
+
1247
+ fig, ax = plt.subplots(figsize=(8, 4))
1248
+ _apply_dark_style(fig, ax)
1249
+
1250
+ bar_colors = ["#3b82f6", "#6366f1", "#8b5cf6", "#a78bfa",
1251
+ "#60a5fa", "#818cf8", "#7c3aed", "#4f46e5"]
1252
+ bars = ax.barh(categories, counts, color=bar_colors[:len(categories)], height=0.6)
1253
+ ax.set_xlabel("Number of Articles")
1254
+ ax.set_title("Articles by Category")
1255
+
1256
+ for bar_item, count in zip(bars, counts):
1257
+ ax.text(
1258
+ bar_item.get_width() + 0.1, bar_item.get_y() + bar_item.get_height() / 2,
1259
+ str(count), va="center", color="#e0e0e0", fontweight="bold",
1260
+ )
1261
+
1262
+ fig.tight_layout()
1263
+ return fig
1264
+
1265
+
1266
+ def _plot_freshness_scores() -> plt.Figure:
1267
+ """Horizontal bar chart of article freshness scores, color-coded."""
1268
+ sorted_articles = sorted(KNOWLEDGE_BASE, key=lambda a: a["freshness"])
1269
+ titles = [f"[{a['id']}]" for a in sorted_articles]
1270
+ scores = [a["freshness"] for a in sorted_articles]
1271
+
1272
+ fig, ax = plt.subplots(figsize=(8, 7))
1273
+ _apply_dark_style(fig, ax)
1274
+
1275
+ colors = []
1276
+ for score in scores:
1277
+ if score >= 0.90:
1278
+ colors.append("#22c55e") # green -- fresh
1279
+ elif score >= 0.80:
1280
+ colors.append("#eab308") # yellow -- aging
1281
+ else:
1282
+ colors.append("#ef4444") # red -- stale
1283
+
1284
+ ax.barh(titles, scores, color=colors, height=0.6)
1285
+ ax.set_xlabel("Freshness Score")
1286
+ ax.set_title("Article Freshness Scores")
1287
+ ax.set_xlim(0, 1.0)
1288
+ ax.axvline(x=0.80, color="#ef4444", linestyle="--", alpha=0.5, label="Stale threshold")
1289
+ ax.legend(loc="lower right", facecolor="#1a1a2e", edgecolor="#2a2a4a", labelcolor="#e0e0e0")
1290
+
1291
+ fig.tight_layout()
1292
+ return fig
1293
+
1294
+
1295
+ def _plot_article_views() -> plt.Figure:
1296
+ """Bar chart of top 10 articles by view count."""
1297
+ sorted_articles = sorted(KNOWLEDGE_BASE, key=lambda a: a["views"], reverse=True)[:10]
1298
+ titles = [f"[{a['id']}]" for a in sorted_articles]
1299
+ views = [a["views"] for a in sorted_articles]
1300
+
1301
+ fig, ax = plt.subplots(figsize=(8, 5))
1302
+ _apply_dark_style(fig, ax)
1303
+
1304
+ gradient_colors = plt.cm.Blues(np.linspace(0.9, 0.4, len(titles)))
1305
+ ax.barh(titles, views, color=gradient_colors, height=0.6)
1306
+ ax.set_xlabel("Page Views")
1307
+ ax.set_title("Top 10 Most Viewed Articles")
1308
+ ax.invert_yaxis()
1309
+
1310
+ for idx, (title_label, view_count) in enumerate(zip(titles, views)):
1311
+ ax.text(
1312
+ view_count + 50, idx, f"{view_count:,}",
1313
+ va="center", color="#e0e0e0", fontsize=9,
1314
+ )
1315
+
1316
+ fig.tight_layout()
1317
+ return fig
1318
+
1319
+
1320
+ def _plot_search_gaps() -> plt.Figure:
1321
+ """Bar chart of top search queries that returned no results or low relevance."""
1322
+ queries = [q for q, _ in SEARCH_QUERIES_LOG[:10]]
1323
+ counts = [c for _, c in SEARCH_QUERIES_LOG[:10]]
1324
+
1325
+ fig, ax = plt.subplots(figsize=(8, 5))
1326
+ _apply_dark_style(fig, ax)
1327
+
1328
+ ax.barh(queries, counts, color="#6366f1", height=0.6)
1329
+ ax.set_xlabel("Search Frequency")
1330
+ ax.set_title("Most Frequent Search Queries")
1331
+ ax.invert_yaxis()
1332
+
1333
+ for idx, count in enumerate(counts):
1334
+ ax.text(
1335
+ count + 3, idx, str(count),
1336
+ va="center", color="#e0e0e0", fontsize=9,
1337
+ )
1338
+
1339
+ fig.tight_layout()
1340
+ return fig
1341
+
1342
+
1343
+ # ---------------------------------------------------------------------------
1344
+ # Gradio Application
1345
+ # ---------------------------------------------------------------------------
1346
+
1347
+ CUSTOM_CSS = """
1348
+ .gradio-container {
1349
+ max-width: 1200px !important;
1350
+ margin: 0 auto !important;
1351
+ }
1352
+
1353
+ .header-text {
1354
+ text-align: center;
1355
+ margin-bottom: 8px;
1356
+ }
1357
+
1358
+ .header-text h1 {
1359
+ font-size: 2em;
1360
+ margin-bottom: 4px;
1361
+ }
1362
+
1363
+ .header-text p {
1364
+ opacity: 0.8;
1365
+ font-size: 1.05em;
1366
+ }
1367
+
1368
+ footer {
1369
+ text-align: center;
1370
+ opacity: 0.6;
1371
+ margin-top: 20px;
1372
+ }
1373
+ """
1374
+
1375
+ HEADER_HTML = """
1376
+ <div class="header-text">
1377
+ <h1>Vaultwise</h1>
1378
+ <p>Knowledge Management Platform — Document Ingestion, TF-IDF Search, AI Q&A, Training Generation, Analytics</p>
1379
+ <p style="font-size: 0.9em; opacity: 0.6;">
1380
+ This interactive demo runs entirely in-browser with a built-in 30-article knowledge base.
1381
+ All search is powered by a from-scratch TF-IDF implementation — no sklearn, no external NLP libraries.
1382
+ </p>
1383
+ </div>
1384
+ """
1385
+
1386
+ FOOTER_HTML = """
1387
+ <footer>
1388
+ <p>
1389
+ <a href="https://github.com/dbhavery/vaultwise" target="_blank">GitHub</a>
1390
+ &nbsp;|&nbsp; Built by Don Havery
1391
+ </p>
1392
+ </footer>
1393
+ """
1394
+
1395
+
1396
+ def build_app() -> gr.Blocks:
1397
+ """Construct and return the Gradio Blocks application."""
1398
+ topic_choices = [article["title"] for article in KNOWLEDGE_BASE]
1399
+
1400
+ with gr.Blocks(
1401
+ title=APP_TITLE,
1402
+ theme=gr.themes.Base(
1403
+ primary_hue=gr.themes.colors.blue,
1404
+ secondary_hue=gr.themes.colors.indigo,
1405
+ neutral_hue=gr.themes.colors.gray,
1406
+ font=gr.themes.GoogleFont("Inter"),
1407
+ ).set(
1408
+ body_background_fill="#0f0f1a",
1409
+ body_background_fill_dark="#0f0f1a",
1410
+ block_background_fill="#1a1a2e",
1411
+ block_background_fill_dark="#1a1a2e",
1412
+ block_border_color="#2a2a4a",
1413
+ block_border_color_dark="#2a2a4a",
1414
+ block_title_text_color="#e0e0e0",
1415
+ block_title_text_color_dark="#e0e0e0",
1416
+ body_text_color="#d0d0d0",
1417
+ body_text_color_dark="#d0d0d0",
1418
+ input_background_fill="#16162a",
1419
+ input_background_fill_dark="#16162a",
1420
+ input_border_color="#2a2a4a",
1421
+ input_border_color_dark="#2a2a4a",
1422
+ button_primary_background_fill="#3b82f6",
1423
+ button_primary_background_fill_dark="#3b82f6",
1424
+ button_primary_text_color="#ffffff",
1425
+ button_primary_text_color_dark="#ffffff",
1426
+ ),
1427
+ css=CUSTOM_CSS,
1428
+ ) as app:
1429
+ gr.HTML(HEADER_HTML)
1430
+
1431
+ with gr.Tabs():
1432
+ # --- Tab 1: Knowledge Search ---
1433
+ with gr.Tab("Knowledge Search"):
1434
+ gr.Markdown(
1435
+ "### TF-IDF Vector Search\n"
1436
+ "Search the NovaCRM knowledge base using term frequency-inverse document "
1437
+ "frequency scoring with cosine similarity ranking. The engine tokenizes "
1438
+ "your query, computes TF-IDF weights against all 30 articles, and returns "
1439
+ "the top 5 matches."
1440
+ )
1441
+ with gr.Row():
1442
+ with gr.Column(scale=4):
1443
+ search_input = gr.Textbox(
1444
+ label="Search Query",
1445
+ placeholder="e.g., API rate limits authentication, SSO configuration, billing invoice...",
1446
+ lines=1,
1447
+ )
1448
+ with gr.Column(scale=1):
1449
+ search_btn = gr.Button("Search", variant="primary")
1450
+
1451
+ search_output = gr.Markdown(
1452
+ value="*Enter a search query to find relevant knowledge base articles.*",
1453
+ label="Results",
1454
+ )
1455
+
1456
+ gr.Examples(
1457
+ examples=[
1458
+ ["API authentication rate limits"],
1459
+ ["how to import contacts from CSV"],
1460
+ ["SSO single sign-on SAML configuration"],
1461
+ ["billing subscription pricing plans"],
1462
+ ["webhook event notifications"],
1463
+ ["GDPR data erasure compliance"],
1464
+ ["mobile app offline mode"],
1465
+ ["workflow automation rules engine"],
1466
+ ],
1467
+ inputs=search_input,
1468
+ label="Example Queries",
1469
+ )
1470
+
1471
+ search_btn.click(fn=perform_search, inputs=search_input, outputs=search_output)
1472
+ search_input.submit(fn=perform_search, inputs=search_input, outputs=search_output)
1473
+
1474
+ # --- Tab 2: AI Q&A ---
1475
+ with gr.Tab("AI Q&A"):
1476
+ gr.Markdown(
1477
+ "### Knowledge-Grounded Question Answering\n"
1478
+ "Ask a natural language question about NovaCRM. The system finds "
1479
+ "the most relevant article via TF-IDF search, then generates an "
1480
+ "answer grounded in the source material with full citation."
1481
+ )
1482
+ with gr.Row():
1483
+ with gr.Column(scale=4):
1484
+ qa_input = gr.Textbox(
1485
+ label="Your Question",
1486
+ placeholder="e.g., How do I set up two-factor authentication?",
1487
+ lines=1,
1488
+ )
1489
+ with gr.Column(scale=1):
1490
+ qa_btn = gr.Button("Ask", variant="primary")
1491
+
1492
+ qa_output = gr.Markdown(
1493
+ value="*Ask a question about NovaCRM to get an answer with source citation.*",
1494
+ label="Answer",
1495
+ )
1496
+
1497
+ gr.Examples(
1498
+ examples=[
1499
+ ["How do I reset my password if my account is locked?"],
1500
+ ["What encryption does NovaCRM use for data at rest?"],
1501
+ ["How can I connect my Gmail to NovaCRM?"],
1502
+ ["What are the different subscription plans and pricing?"],
1503
+ ["How do I configure webhooks for deal updates?"],
1504
+ ["What compliance certifications does NovaCRM have?"],
1505
+ ],
1506
+ inputs=qa_input,
1507
+ label="Example Questions",
1508
+ )
1509
+
1510
+ qa_btn.click(fn=answer_question, inputs=qa_input, outputs=qa_output)
1511
+ qa_input.submit(fn=answer_question, inputs=qa_input, outputs=qa_output)
1512
+
1513
+ # --- Tab 3: Training Generator ---
1514
+ with gr.Tab("Training Generator"):
1515
+ gr.Markdown(
1516
+ "### Auto-Generated Training Material\n"
1517
+ "Select a knowledge base article to generate a structured training module "
1518
+ "with learning objectives, content outline, and a 5-question multiple-choice quiz."
1519
+ )
1520
+ with gr.Row():
1521
+ with gr.Column(scale=4):
1522
+ training_dropdown = gr.Dropdown(
1523
+ choices=topic_choices,
1524
+ label="Select Article Topic",
1525
+ value=None,
1526
+ )
1527
+ with gr.Column(scale=1):
1528
+ training_btn = gr.Button("Generate", variant="primary")
1529
+
1530
+ training_output = gr.Markdown(
1531
+ value="*Select a topic to generate training material.*",
1532
+ label="Training Material",
1533
+ )
1534
+
1535
+ training_btn.click(
1536
+ fn=generate_training, inputs=training_dropdown, outputs=training_output
1537
+ )
1538
+
1539
+ # --- Tab 4: Knowledge Gap Analytics ---
1540
+ with gr.Tab("Knowledge Gap Analytics"):
1541
+ gr.Markdown(
1542
+ "### Knowledge Base Analytics Dashboard\n"
1543
+ "Health metrics, content freshness, usage patterns, and gap analysis "
1544
+ "for the knowledge base."
1545
+ )
1546
+ analytics_btn = gr.Button("Generate Analytics Report", variant="primary")
1547
+
1548
+ analytics_summary = gr.Markdown(label="Summary")
1549
+
1550
+ with gr.Row():
1551
+ category_chart = gr.Plot(label="Articles by Category")
1552
+ views_chart = gr.Plot(label="Most Viewed Articles")
1553
+
1554
+ with gr.Row():
1555
+ freshness_chart = gr.Plot(label="Freshness Scores")
1556
+ gaps_chart = gr.Plot(label="Search Query Frequency")
1557
+
1558
+ analytics_btn.click(
1559
+ fn=generate_analytics,
1560
+ inputs=[],
1561
+ outputs=[analytics_summary, category_chart, freshness_chart, views_chart, gaps_chart],
1562
+ )
1563
+
1564
+ gr.HTML(FOOTER_HTML)
1565
+
1566
+ return app
1567
+
1568
+
1569
+ # ---------------------------------------------------------------------------
1570
+ # Entry point
1571
+ # ---------------------------------------------------------------------------
1572
+
1573
+ if __name__ == "__main__":
1574
+ application = build_app()
1575
+ application.launch()
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio==5.29.0
2
+ numpy>=1.26.0
3
+ matplotlib>=3.8.0