Spaces:

dbhavery
/

vaultwise-knowledge

Sleeping

App Files Files Community

dbhavery commited on Mar 8

Commit

bc51393

verified ·

1 Parent(s): e101077

Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +117 -10
__pycache__/app.cpython-313.pyc +0 -0
app.py +1575 -0
requirements.txt +3 -0

README.md CHANGED Viewed

@@ -1,10 +1,117 @@
----
-title: Vaultwise Knowledge
-emoji: 📉
-colorFrom: pink
-colorTo: indigo
-sdk: static
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Vaultwise Knowledge
+emoji: "\U0001F4DA"
+colorFrom: indigo
+colorTo: blue
+sdk: gradio
+sdk_version: 5.29.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# Vaultwise -- Knowledge Management Platform
+**Interactive demo for [Vaultwise](https://github.com/dbhavery/vaultwise), a knowledge management platform with document ingestion, vector search, AI-powered Q&A, training generation, and analytics.**
+Vaultwise is a full-stack application (FastAPI + React) designed for teams that need to organize, search, and learn from their internal knowledge base. This demo showcases the core search and analytics capabilities using a built-in 30-article corpus for a fictional SaaS company.
+## Demo Tabs
+| Tab | What It Does |
+|-----|--------------|
+| **Knowledge Search** | TF-IDF vector search over 30 knowledge base articles. Enter a query, get ranked results with relevance scores and highlighted matching terms. |
+| **AI Q&A** | Natural language question answering grounded in the knowledge base. Finds the best-matching article via TF-IDF, then generates an answer with source citation and relevant excerpt. |
+| **Training Generator** | Select any article to auto-generate a training module: learning objectives, structured content outline, and a 5-question multiple-choice quiz. |
+| **Knowledge Gap Analytics** | Dashboard with article distribution by category, freshness scores, view counts, and search query frequency analysis. |
+## Search Algorithm
+The TF-IDF search engine is implemented from scratch using only Python and numpy -- no sklearn, no external NLP libraries.
+### How It Works
+**1. Tokenization**
+Input text is lowercased, punctuation-stripped, and split into tokens. A stop word list filters out common English words that carry no semantic weight.
+**2. Term Frequency (TF)**
+Uses augmented term frequency to prevent bias toward longer documents:
+```
+TF(t, d) = 0.5 + 0.5 * (count(t, d) / max_count(d))
+```
+**3. Inverse Document Frequency (IDF)**
+Measures how rare a term is across the corpus. Terms appearing in fewer documents receive higher weight:
+```
+IDF(t) = log(N / (1 + df(t)))
+```
+Where N is the total number of documents and df(t) is the number of documents containing term t. The +1 smoothing prevents division by zero.
+**4. TF-IDF Weight**
+The final weight for each term in each document:
+```
+W(t, d) = TF(t, d) * IDF(t)
+```
+**5. Cosine Similarity**
+Queries are converted to TF-IDF vectors using the same vocabulary and IDF values. Ranking uses cosine similarity between the query vector and each document vector:
+```
+similarity(q, d) = (q . d) / (||q|| * ||d||)
+```
+This measures the angle between vectors, making it independent of document length.
+### Architecture (Full Platform)
+```
+Frontend (React + Vite)
+    |
+    v
+API Gateway (FastAPI)
+    |
+    +-- Document Ingestion Pipeline
+    |       PDF, HTML, Markdown parsing
+    |       Chunking and metadata extraction
+    |
+    +-- Search Engine
+    |       TF-IDF vectorization
+    |       Cosine similarity ranking
+    |       Query expansion and filtering
+    |
+    +-- AI Q&A Module
+    |       Context retrieval via search
+    |       LLM-powered answer generation
+    |       Source citation and grounding
+    |
+    +-- Training Generator
+    |       Article analysis
+    |       Outline and quiz generation
+    |       Learning objective extraction
+    |
+    +-- Analytics Engine
+            Usage tracking
+            Freshness scoring
+            Gap identification
+```
+## Running Locally
+```bash
+pip install gradio numpy matplotlib
+python app.py
+```
+## Links
+- **Source code:** [github.com/dbhavery/vaultwise](https://github.com/dbhavery/vaultwise)
+- **Author:** [Don Havery](https://github.com/dbhavery)

__pycache__/app.cpython-313.pyc ADDED Viewed

Binary file (67.2 kB). View file

app.py ADDED Viewed

	@@ -0,0 +1,1575 @@

+"""
+Vaultwise -- Knowledge Management Platform
+Interactive demo showcasing TF-IDF search, AI Q&A, training generation, and analytics.
+All search functionality is implemented from scratch using numpy.
+No sklearn or external NLP libraries required.
+"""
+import math
+import re
+import string
+from collections import Counter
+from typing import Optional
+import gradio as gr
+import matplotlib
+import matplotlib.pyplot as plt
+import numpy as np
+matplotlib.use("Agg")
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+APP_TITLE = "Vaultwise -- Knowledge Management Platform"
+ACCENT_COLOR = "#3b82f6"
+TOP_K_RESULTS = 5
+CATEGORIES = [
+    "Onboarding",
+    "Billing",
+    "API",
+    "Security",
+    "Integrations",
+    "Infrastructure",
+    "Support",
+    "Compliance",
+]
+STOP_WORDS = frozenset(
+    {
+        "a", "an", "the", "and", "or", "but", "in", "on", "at", "to", "for",
+        "of", "with", "by", "from", "is", "it", "as", "are", "was", "were",
+        "be", "been", "being", "have", "has", "had", "do", "does", "did",
+        "will", "would", "could", "should", "may", "might", "shall", "can",
+        "this", "that", "these", "those", "i", "you", "he", "she", "we",
+        "they", "me", "him", "her", "us", "them", "my", "your", "his",
+        "its", "our", "their", "what", "which", "who", "whom", "how",
+        "when", "where", "why", "not", "no", "all", "each", "every",
+        "both", "few", "more", "most", "other", "some", "such", "than",
+        "too", "very", "just", "about", "if", "so", "also", "up", "out",
+        "into", "over", "after", "before", "between", "under", "through",
+        "during", "above", "below", "any", "only", "own", "same", "then",
+        "there", "here", "once", "while", "now", "new", "get", "use",
+    }
+)
+# ---------------------------------------------------------------------------
+# Knowledge Base -- 30 articles for a fictional SaaS company "NovaCRM"
+# ---------------------------------------------------------------------------
+KNOWLEDGE_BASE: list[dict[str, str]] = [
+    # --- Onboarding ---
+    {
+        "id": "KB-001",
+        "title": "Getting Started with NovaCRM",
+        "category": "Onboarding",
+        "content": (
+            "Welcome to NovaCRM. This guide walks new users through initial account "
+            "setup, workspace configuration, and first-time login. After signing up, "
+            "you will receive a verification email. Click the link to activate your "
+            "account. Once logged in, navigate to Settings > Workspace to configure "
+            "your company name, timezone, and default currency. Invite team members "
+            "from the Team Management page by entering their email addresses. Each "
+            "new member receives an onboarding checklist that tracks their setup "
+            "progress through profile completion, integration connections, and first "
+            "deal creation."
+        ),
+        "views": 4521,
+        "freshness": 0.95,
+    },
+    {
+        "id": "KB-002",
+        "title": "User Roles and Permissions Overview",
+        "category": "Onboarding",
+        "content": (
+            "NovaCRM supports four user roles: Admin, Manager, Agent, and Viewer. "
+            "Admins have full system access including billing, user management, and "
+            "API key generation. Managers can create teams, assign leads, and view "
+            "team analytics. Agents can manage their own contacts, deals, and tasks. "
+            "Viewers have read-only access to dashboards and reports. Custom roles "
+            "can be created under Settings > Roles with granular permission toggles "
+            "for each module. Role inheritance allows child roles to automatically "
+            "receive parent permissions. Audit logs track all permission changes."
+        ),
+        "views": 3187,
+        "freshness": 0.88,
+    },
+    {
+        "id": "KB-003",
+        "title": "Importing Contacts and Data Migration",
+        "category": "Onboarding",
+        "content": (
+            "NovaCRM supports CSV, Excel, and vCard imports for contact migration. "
+            "Navigate to Contacts > Import to upload your file. The mapping wizard "
+            "automatically detects common fields like name, email, phone, and company. "
+            "For custom fields, drag and drop column headers to match your schema. "
+            "Duplicate detection runs automatically using email address matching with "
+            "configurable merge rules. For large migrations over 50,000 records, use "
+            "the bulk import API endpoint which processes records asynchronously and "
+            "sends a completion webhook. Migration history is available under Settings "
+            "> Data > Import History with rollback capabilities for the last 30 days."
+        ),
+        "views": 2843,
+        "freshness": 0.82,
+    },
+    {
+        "id": "KB-004",
+        "title": "Setting Up Your Sales Pipeline",
+        "category": "Onboarding",
+        "content": (
+            "The sales pipeline in NovaCRM is fully customizable. Go to Pipeline > "
+            "Settings to create stages. Default stages include Lead, Qualified, "
+            "Proposal, Negotiation, and Closed Won or Closed Lost. Each stage has "
+            "configurable probability percentages for revenue forecasting. Drag deals "
+            "between stages on the Kanban board or update them in list view. "
+            "Automation rules can trigger actions when deals move between stages, "
+            "such as sending follow-up emails, creating tasks, or notifying managers. "
+            "Pipeline analytics show conversion rates between stages, average deal "
+            "velocity, and bottleneck identification."
+        ),
+        "views": 3654,
+        "freshness": 0.91,
+    },
+    # --- Billing ---
+    {
+        "id": "KB-005",
+        "title": "Subscription Plans and Pricing",
+        "category": "Billing",
+        "content": (
+            "NovaCRM offers three subscription tiers: Starter at 29 dollars per user "
+            "per month, Professional at 79 dollars per user per month, and Enterprise "
+            "with custom pricing. Starter includes contact management, basic pipeline, "
+            "email integration, and 5 GB storage. Professional adds workflow automation, "
+            "advanced analytics, API access, and 50 GB storage. Enterprise includes "
+            "custom integrations, dedicated support, SSO, audit logs, and unlimited "
+            "storage. All plans include a 14-day free trial with no credit card "
+            "required. Annual billing provides a 20 percent discount."
+        ),
+        "views": 5102,
+        "freshness": 0.97,
+    },
+    {
+        "id": "KB-006",
+        "title": "Managing Invoices and Payment Methods",
+        "category": "Billing",
+        "content": (
+            "Access your billing dashboard at Settings > Billing > Invoices. NovaCRM "
+            "accepts credit cards via Stripe and bank transfers for Enterprise plans. "
+            "Invoices are generated on the first of each month and sent to the billing "
+            "email address. Download invoices as PDF from the billing history page. "
+            "To update payment methods, navigate to Settings > Billing > Payment "
+            "Methods and add a new card or bank account. Failed payments trigger "
+            "automatic retry on days 3, 7, and 14. After three failures, the account "
+            "enters a 7-day grace period before suspension. Tax ID and VAT numbers "
+            "can be configured for proper invoice formatting."
+        ),
+        "views": 1876,
+        "freshness": 0.85,
+    },
+    {
+        "id": "KB-007",
+        "title": "Upgrading and Downgrading Your Plan",
+        "category": "Billing",
+        "content": (
+            "Plan changes take effect immediately. When upgrading, you are charged a "
+            "prorated amount for the remaining billing cycle. When downgrading, the "
+            "new rate applies at the next billing cycle and a credit is issued for "
+            "the difference. Navigate to Settings > Billing > Change Plan to see "
+            "available options. Feature access adjusts automatically upon plan change. "
+            "Data retention is maintained during downgrades, but access to premium "
+            "features is restricted. If your current usage exceeds the new plan limits, "
+            "you will receive a warning with 30 days to reduce usage before enforcement."
+        ),
+        "views": 1342,
+        "freshness": 0.79,
+    },
+    # --- API ---
+    {
+        "id": "KB-008",
+        "title": "REST API Authentication and Rate Limits",
+        "category": "API",
+        "content": (
+            "The NovaCRM REST API uses Bearer token authentication. Generate API keys "
+            "at Settings > API > Keys. Each key has configurable scopes: read, write, "
+            "delete, and admin. Include the token in the Authorization header as "
+            "Bearer followed by the token value. Rate limits depend on your plan: "
+            "Starter allows 100 requests per minute, Professional allows 1000, and "
+            "Enterprise allows 10000. Rate limit headers X-RateLimit-Remaining and "
+            "X-RateLimit-Reset are included in every response. Exceeding the limit "
+            "returns HTTP 429 with a Retry-After header. API keys can be rotated "
+            "without downtime using the key rotation endpoint."
+        ),
+        "views": 4210,
+        "freshness": 0.93,
+    },
+    {
+        "id": "KB-009",
+        "title": "API Endpoints for Contact Management",
+        "category": "API",
+        "content": (
+            "Contact CRUD operations are available at the /api/v2/contacts endpoint. "
+            "GET returns a paginated list with default page size of 50. Use query "
+            "parameters for filtering: status, created_after, tags, and owner_id. "
+            "POST creates a new contact with required fields email and name. PUT "
+            "updates an existing contact by ID. PATCH allows partial updates. DELETE "
+            "moves a contact to trash with 30-day recovery. Bulk operations are "
+            "supported via /api/v2/contacts/bulk with a maximum batch size of 1000. "
+            "Response format follows JSON API specification with included relationships "
+            "for deals, activities, and notes."
+        ),
+        "views": 3890,
+        "freshness": 0.90,
+    },
+    {
+        "id": "KB-010",
+        "title": "Webhooks and Event Subscriptions",
+        "category": "API",
+        "content": (
+            "NovaCRM supports webhooks for real-time event notifications. Configure "
+            "webhook endpoints at Settings > API > Webhooks. Available events include "
+            "contact.created, contact.updated, deal.stage_changed, deal.won, "
+            "deal.lost, task.completed, and email.opened. Each webhook delivery "
+            "includes an HMAC-SHA256 signature in the X-Webhook-Signature header "
+            "for payload verification. Failed deliveries are retried with exponential "
+            "backoff up to 5 times over 24 hours. Webhook logs show delivery status, "
+            "response codes, and payload details for the last 30 days. Test endpoints "
+            "can be configured to receive sample payloads during development."
+        ),
+        "views": 2156,
+        "freshness": 0.87,
+    },
+    {
+        "id": "KB-011",
+        "title": "GraphQL API for Advanced Queries",
+        "category": "API",
+        "content": (
+            "NovaCRM provides a GraphQL endpoint at /api/graphql for complex data "
+            "queries. The schema supports contacts, deals, activities, teams, and "
+            "reports. Introspection is enabled for development environments. Query "
+            "depth is limited to 10 levels to prevent abuse. Mutations support "
+            "creating, updating, and deleting records with input validation. "
+            "Subscriptions are available for real-time updates via WebSocket "
+            "connections. The GraphQL playground is accessible at /api/graphql/explore "
+            "with auto-complete and documentation. Batch queries are supported with "
+            "a maximum of 5 operations per request to maintain performance."
+        ),
+        "views": 1567,
+        "freshness": 0.84,
+    },
+    # --- Security ---
+    {
+        "id": "KB-012",
+        "title": "Single Sign-On (SSO) Configuration",
+        "category": "Security",
+        "content": (
+            "Enterprise plans support SAML 2.0 and OpenID Connect for SSO. Navigate "
+            "to Settings > Security > SSO to configure your identity provider. "
+            "Supported providers include Okta, Azure AD, Google Workspace, and "
+            "OneLogin. Upload your IdP metadata XML or enter the SSO URL, entity "
+            "ID, and X.509 certificate manually. User provisioning can be automated "
+            "via SCIM 2.0 for user lifecycle management. JIT (Just-In-Time) "
+            "provisioning creates user accounts on first login. SSO enforcement "
+            "can be toggled to require all users to authenticate through the IdP, "
+            "with bypass codes available for emergency admin access."
+        ),
+        "views": 1890,
+        "freshness": 0.86,
+    },
+    {
+        "id": "KB-013",
+        "title": "Two-Factor Authentication Setup",
+        "category": "Security",
+        "content": (
+            "NovaCRM supports TOTP-based two-factor authentication via authenticator "
+            "apps such as Google Authenticator, Authy, and Microsoft Authenticator. "
+            "Enable 2FA at Profile > Security > Two-Factor Authentication. Scan the "
+            "QR code with your authenticator app and enter the verification code to "
+            "complete setup. Backup codes are generated during setup for account "
+            "recovery. Admins can enforce mandatory 2FA for all users or specific "
+            "roles under Settings > Security > Authentication Policies. SMS-based "
+            "2FA is available as a fallback option. Hardware security keys following "
+            "the FIDO2 WebAuthn standard are supported on Professional and Enterprise "
+            "plans."
+        ),
+        "views": 2567,
+        "freshness": 0.92,
+    },
+    {
+        "id": "KB-014",
+        "title": "Data Encryption and Privacy Policies",
+        "category": "Security",
+        "content": (
+            "All data in NovaCRM is encrypted at rest using AES-256 and in transit "
+            "using TLS 1.3. Database backups are encrypted with separate keys stored "
+            "in AWS KMS. Personal data fields support field-level encryption for "
+            "enhanced privacy compliance. Data residency options allow choosing "
+            "between US, EU, and APAC regions for primary storage. NovaCRM is SOC 2 "
+            "Type II certified and GDPR compliant. Data Processing Agreements are "
+            "available for Enterprise customers. Right to erasure requests are "
+            "processed within 72 hours. Automated data retention policies can be "
+            "configured per data type with minimum 30-day and maximum 7-year ranges."
+        ),
+        "views": 3210,
+        "freshness": 0.94,
+    },
+    {
+        "id": "KB-015",
+        "title": "Audit Logging and Compliance Reporting",
+        "category": "Security",
+        "content": (
+            "NovaCRM maintains comprehensive audit logs of all user actions, API "
+            "calls, and system events. Access audit logs at Settings > Security > "
+            "Audit Log. Logs include timestamp, user identity, action performed, "
+            "affected resource, IP address, and user agent. Logs are retained for "
+            "one year on Professional plans and seven years on Enterprise. Export "
+            "audit logs as CSV or JSON for external SIEM integration. Compliance "
+            "reports for SOC 2, HIPAA, and GDPR can be generated on demand. "
+            "Scheduled compliance reports can be configured to run weekly or monthly "
+            "with automatic delivery to designated compliance officers."
+        ),
+        "views": 1456,
+        "freshness": 0.81,
+    },
+    # --- Integrations ---
+    {
+        "id": "KB-016",
+        "title": "Slack Integration for Team Notifications",
+        "category": "Integrations",
+        "content": (
+            "Connect NovaCRM to Slack for real-time deal and activity notifications. "
+            "Navigate to Settings > Integrations > Slack and click Connect. "
+            "Authorize NovaCRM to access your Slack workspace. Configure notification "
+            "channels for different event types: deal updates, new leads, task "
+            "assignments, and system alerts. Use slash commands to query CRM data "
+            "directly from Slack: /novacrm search retrieves contacts, /novacrm deal "
+            "shows deal details, and /novacrm report generates quick summaries. "
+            "Interactive buttons in notifications allow agents to update deal stages, "
+            "add notes, and schedule follow-ups without leaving Slack."
+        ),
+        "views": 2987,
+        "freshness": 0.89,
+    },
+    {
+        "id": "KB-017",
+        "title": "Email Integration with Gmail and Outlook",
+        "category": "Integrations",
+        "content": (
+            "NovaCRM syncs with Gmail and Outlook for bidirectional email tracking. "
+            "Go to Settings > Integrations > Email to connect your account via OAuth. "
+            "Incoming emails from known contacts are automatically linked to their "
+            "CRM records. Email templates with merge fields can be created and shared "
+            "across the team. Tracking pixels detect email opens and link clicks with "
+            "timestamps. Scheduled sending allows queuing emails for optimal delivery "
+            "times. Email sequences enable automated multi-step outreach campaigns "
+            "with configurable delays and exit conditions. Unsubscribe handling "
+            "complies with CAN-SPAM and GDPR requirements automatically."
+        ),
+        "views": 4102,
+        "freshness": 0.91,
+    },
+    {
+        "id": "KB-018",
+        "title": "Zapier and Make Integration Hub",
+        "category": "Integrations",
+        "content": (
+            "NovaCRM integrates with over 3000 applications through Zapier and Make "
+            "connectors. Common automation recipes include syncing new contacts to "
+            "email marketing platforms, creating support tickets from deal notes, "
+            "and updating accounting software when deals close. The NovaCRM Zapier "
+            "app supports triggers for contact events, deal changes, and form "
+            "submissions. Actions include creating contacts, updating deals, and "
+            "adding notes. Multi-step Zaps enable complex workflows spanning "
+            "multiple applications. Make scenarios support parallel branches for "
+            "simultaneous actions across different services."
+        ),
+        "views": 1789,
+        "freshness": 0.83,
+    },
+    {
+        "id": "KB-019",
+        "title": "Calendar Sync with Google and Microsoft",
+        "category": "Integrations",
+        "content": (
+            "Synchronize your calendar with NovaCRM for seamless meeting management. "
+            "Connect Google Calendar or Microsoft Outlook Calendar at Settings > "
+            "Integrations > Calendar. Two-way sync ensures meetings created in either "
+            "platform appear in both. Meeting links from Zoom, Teams, and Google "
+            "Meet are automatically detected and added to CRM activities. The booking "
+            "page feature generates shareable scheduling links with configurable "
+            "availability windows, buffer times, and round-robin assignment for teams. "
+            "Meeting outcomes can be logged directly from calendar events with "
+            "predefined disposition codes and next-step actions."
+        ),
+        "views": 2345,
+        "freshness": 0.88,
+    },
+    # --- Infrastructure ---
+    {
+        "id": "KB-020",
+        "title": "System Architecture and Performance",
+        "category": "Infrastructure",
+        "content": (
+            "NovaCRM runs on a microservices architecture deployed on AWS. The API "
+            "layer uses load-balanced application servers behind CloudFront CDN. "
+            "PostgreSQL with read replicas handles primary data storage. Redis "
+            "provides caching and session management. Elasticsearch powers full-text "
+            "search across contacts, deals, and communications. Background job "
+            "processing uses a distributed task queue for email sending, report "
+            "generation, and data imports. The platform maintains 99.9 percent uptime "
+            "SLA with automated failover across availability zones. Response times "
+            "average under 200 milliseconds for API calls."
+        ),
+        "views": 987,
+        "freshness": 0.76,
+    },
+    {
+        "id": "KB-021",
+        "title": "Backup and Disaster Recovery Procedures",
+        "category": "Infrastructure",
+        "content": (
+            "NovaCRM performs automated database backups every 6 hours with point-in-time "
+            "recovery capability for the last 35 days. Backups are stored in a separate "
+            "AWS region from production data. Full disaster recovery tests are conducted "
+            "quarterly with documented recovery time objectives of 4 hours and recovery "
+            "point objectives of 1 hour. Customer data exports can be scheduled daily "
+            "or weekly via Settings > Data > Automated Exports in CSV or JSON format. "
+            "Enterprise customers can configure custom backup schedules and retention "
+            "policies. Backup verification runs automated integrity checks after each "
+            "snapshot to ensure recoverability."
+        ),
+        "views": 654,
+        "freshness": 0.72,
+    },
+    {
+        "id": "KB-022",
+        "title": "Status Page and Incident Response",
+        "category": "Infrastructure",
+        "content": (
+            "Monitor NovaCRM system status at status.novacrm.com. The status page "
+            "shows real-time availability for all services: API, web application, "
+            "email delivery, webhook processing, and integrations. Subscribe to "
+            "status updates via email, SMS, or RSS. During incidents, updates are "
+            "posted every 15 minutes until resolution. Post-incident reports are "
+            "published within 48 hours with root cause analysis and preventive "
+            "measures. Scheduled maintenance windows are announced 72 hours in "
+            "advance. Enterprise customers receive priority notification through "
+            "a dedicated Slack channel and direct account manager communication."
+        ),
+        "views": 1123,
+        "freshness": 0.80,
+    },
+    # --- Support ---
+    {
+        "id": "KB-023",
+        "title": "Contacting Support and SLA Details",
+        "category": "Support",
+        "content": (
+            "NovaCRM support is available through multiple channels. Starter plans "
+            "include email support with 24-hour response time during business hours. "
+            "Professional plans add live chat with 4-hour response time and phone "
+            "support during extended hours. Enterprise plans include a dedicated "
+            "account manager, 1-hour response time for critical issues, and 24/7 "
+            "phone support. Submit tickets at support.novacrm.com or via the in-app "
+            "help widget. Priority levels range from P1 for system-wide outages to "
+            "P4 for feature requests. Escalation procedures are documented in the "
+            "support portal with clear timelines for each priority level."
+        ),
+        "views": 3456,
+        "freshness": 0.90,
+    },
+    {
+        "id": "KB-024",
+        "title": "Troubleshooting Common Login Issues",
+        "category": "Support",
+        "content": (
+            "Common login problems include forgotten passwords, expired sessions, "
+            "and browser compatibility issues. To reset your password, click Forgot "
+            "Password on the login page and enter your registered email. Reset links "
+            "expire after 24 hours. If your account is locked after 5 failed attempts, "
+            "wait 30 minutes or contact support for immediate unlock. Clear browser "
+            "cache and cookies if you experience persistent session errors. NovaCRM "
+            "supports Chrome, Firefox, Safari, and Edge in their last two major "
+            "versions. Disable browser extensions if you encounter rendering issues. "
+            "For SSO login problems, verify your IdP configuration and check the "
+            "SSO debug log at Settings > Security > SSO > Debug."
+        ),
+        "views": 5678,
+        "freshness": 0.93,
+    },
+    {
+        "id": "KB-025",
+        "title": "Feature Request and Feedback Process",
+        "category": "Support",
+        "content": (
+            "Submit feature requests through the NovaCRM feedback portal at "
+            "feedback.novacrm.com. Each request can be voted on by other users to "
+            "help prioritize development. The product team reviews submissions "
+            "monthly and updates status to Under Review, Planned, In Development, "
+            "or Released. Public roadmap visibility is available for Professional "
+            "and Enterprise plans. Beta features can be enabled per account at "
+            "Settings > Labs. Beta participants provide feedback through in-app "
+            "surveys and dedicated Slack channels. Feature request history and "
+            "status tracking are available in your account dashboard."
+        ),
+        "views": 890,
+        "freshness": 0.77,
+    },
+    # --- Compliance ---
+    {
+        "id": "KB-026",
+        "title": "GDPR Compliance and Data Subject Requests",
+        "category": "Compliance",
+        "content": (
+            "NovaCRM provides built-in tools for GDPR compliance. The Data Subject "
+            "Request portal at Settings > Compliance > DSR handles right of access, "
+            "right to rectification, right to erasure, and data portability requests. "
+            "Automated workflows process erasure requests within 72 hours, removing "
+            "personal data from all systems including backups within 30 days. Consent "
+            "management tracks legal basis for data processing per contact. Data "
+            "processing records are maintained automatically. Cookie consent banners "
+            "are configurable for customer-facing forms. Annual GDPR compliance "
+            "assessments are available for Enterprise customers with documentation "
+            "support for supervisory authority inquiries."
+        ),
+        "views": 2134,
+        "freshness": 0.89,
+    },
+    {
+        "id": "KB-027",
+        "title": "HIPAA Compliance for Healthcare Customers",
+        "category": "Compliance",
+        "content": (
+            "NovaCRM Enterprise supports HIPAA compliance for healthcare organizations. "
+            "A Business Associate Agreement is available upon request. HIPAA-compliant "
+            "configurations include field-level encryption for Protected Health "
+            "Information, access controls with minimum necessary permissions, and "
+            "enhanced audit logging for PHI access. Automatic session timeout after "
+            "15 minutes of inactivity is enforced. PHI data is stored in dedicated "
+            "encrypted partitions with separate key management. Employee training "
+            "records for HIPAA awareness are trackable within the compliance module. "
+            "Breach notification workflows automate the required 60-day reporting "
+            "timeline with documentation templates."
+        ),
+        "views": 876,
+        "freshness": 0.75,
+    },
+    # --- Mixed / Advanced ---
+    {
+        "id": "KB-028",
+        "title": "Workflow Automation Rules Engine",
+        "category": "Integrations",
+        "content": (
+            "The NovaCRM rules engine enables no-code workflow automation. Create "
+            "rules at Automations > Rules with trigger-condition-action logic. "
+            "Triggers include record creation, field changes, time-based schedules, "
+            "and webhook events. Conditions support AND/OR logic with field "
+            "comparisons, formula evaluation, and related record checks. Actions "
+            "include sending emails, creating tasks, updating fields, sending "
+            "notifications, and calling external webhooks. Rules execute in real-time "
+            "with a maximum chain depth of 5 to prevent infinite loops. Execution "
+            "logs track every rule firing with input data, conditions evaluated, "
+            "and actions performed for debugging and audit purposes."
+        ),
+        "views": 3890,
+        "freshness": 0.92,
+    },
+    {
+        "id": "KB-029",
+        "title": "Custom Reporting and Dashboard Builder",
+        "category": "Infrastructure",
+        "content": (
+            "Build custom reports and dashboards with the NovaCRM report builder. "
+            "Navigate to Analytics > Reports > New Report to start. Choose from "
+            "report types: tabular, summary, matrix, and chart. Data sources include "
+            "contacts, deals, activities, emails, and custom objects. Apply filters, "
+            "groupings, and calculated fields using formula syntax. Schedule reports "
+            "for automatic delivery via email in PDF or Excel format. Dashboards "
+            "support drag-and-drop widget placement with resizable components. "
+            "Available widgets include metric cards, bar charts, line graphs, pie "
+            "charts, funnels, and data tables. Share dashboards with teams or "
+            "specific users with view or edit permissions."
+        ),
+        "views": 2678,
+        "freshness": 0.87,
+    },
+    {
+        "id": "KB-030",
+        "title": "Mobile App Features and Offline Mode",
+        "category": "Support",
+        "content": (
+            "The NovaCRM mobile app is available for iOS and Android. Download from "
+            "the App Store or Google Play Store. The mobile app supports contact "
+            "management, deal updates, task management, and activity logging. Push "
+            "notifications alert you to new leads, deal changes, and task deadlines. "
+            "Offline mode caches your most recent 500 contacts and 100 deals for "
+            "access without internet connectivity. Changes made offline sync "
+            "automatically when connection is restored with conflict resolution for "
+            "simultaneous edits. Business card scanning uses OCR to create contacts "
+            "from photos. Voice notes can be attached to any record and are "
+            "automatically transcribed using speech recognition."
+        ),
+        "views": 3210,
+        "freshness": 0.85,
+    },
+]
+# ---------------------------------------------------------------------------
+# TF-IDF Engine -- implemented from scratch
+# ---------------------------------------------------------------------------
+def tokenize(text: str) -> list[str]:
+    """Lowercase, strip punctuation, split into tokens, remove stop words."""
+    text = text.lower()
+    text = text.translate(str.maketrans("", "", string.punctuation))
+    tokens = text.split()
+    return [t for t in tokens if t not in STOP_WORDS and len(t) > 1]
+def compute_term_frequency(tokens: list[str]) -> dict[str, float]:
+    """Compute augmented term frequency: 0.5 + 0.5 * (count / max_count).
+    Augmented TF prevents bias toward longer documents.
+    """
+    counts = Counter(tokens)
+    if not counts:
+        return {}
+    max_count = max(counts.values())
+    return {
+        term: 0.5 + 0.5 * (count / max_count)
+        for term, count in counts.items()
+    }
+def compute_idf(corpus_tokens: list[list[str]], vocabulary: list[str]) -> dict[str, float]:
+    """Compute inverse document frequency: log(N / (1 + df)).
+    Uses smoothed IDF to avoid division by zero for terms not in any document.
+    """
+    num_documents = len(corpus_tokens)
+    idf_values: dict[str, float] = {}
+    for term in vocabulary:
+        document_frequency = sum(
+            1 for doc_tokens in corpus_tokens if term in set(doc_tokens)
+        )
+        idf_values[term] = math.log(num_documents / (1 + document_frequency))
+    return idf_values
+def build_tfidf_matrix(
+    corpus_tokens: list[list[str]],
+    vocabulary: list[str],
+    idf_values: dict[str, float],
+) -> np.ndarray:
+    """Build a TF-IDF matrix of shape (num_documents, vocab_size)."""
+    vocab_index = {term: idx for idx, term in enumerate(vocabulary)}
+    matrix = np.zeros((len(corpus_tokens), len(vocabulary)), dtype=np.float64)
+    for doc_idx, tokens in enumerate(corpus_tokens):
+        tf_values = compute_term_frequency(tokens)
+        for term, tf_score in tf_values.items():
+            if term in vocab_index:
+                col_idx = vocab_index[term]
+                matrix[doc_idx, col_idx] = tf_score * idf_values[term]
+    return matrix
+def cosine_similarity_vector(matrix: np.ndarray, query_vector: np.ndarray) -> np.ndarray:
+    """Compute cosine similarity between each row of matrix and query_vector."""
+    dot_products = matrix @ query_vector
+    matrix_norms = np.linalg.norm(matrix, axis=1)
+    query_norm = np.linalg.norm(query_vector)
+    denominator = matrix_norms * query_norm
+    # Avoid division by zero for zero-norm vectors
+    denominator = np.where(denominator == 0, 1.0, denominator)
+    return dot_products / denominator
+class TFIDFSearchEngine:
+    """TF-IDF search engine with cosine similarity ranking."""
+    def __init__(self, articles: list[dict[str, str]]) -> None:
+        self.articles = articles
+        self._corpus_tokens: list[list[str]] = []
+        self._vocabulary: list[str] = []
+        self._idf: dict[str, float] = {}
+        self._tfidf_matrix: np.ndarray = np.array([])
+        self._vocab_index: dict[str, int] = {}
+        self._build_index()
+    def _build_index(self) -> None:
+        """Tokenize all articles and precompute the TF-IDF matrix."""
+        self._corpus_tokens = [
+            tokenize(article["title"] + " " + article["content"])
+            for article in self.articles
+        ]
+        vocab_set: set[str] = set()
+        for tokens in self._corpus_tokens:
+            vocab_set.update(tokens)
+        self._vocabulary = sorted(vocab_set)
+        self._vocab_index = {term: idx for idx, term in enumerate(self._vocabulary)}
+        self._idf = compute_idf(self._corpus_tokens, self._vocabulary)
+        self._tfidf_matrix = build_tfidf_matrix(
+            self._corpus_tokens, self._vocabulary, self._idf
+        )
+    def search(self, query: str, top_k: int = TOP_K_RESULTS) -> list[dict]:
+        """Search the corpus and return top_k results with scores and matched terms."""
+        query_tokens = tokenize(query)
+        if not query_tokens:
+            return []
+        query_tf = compute_term_frequency(query_tokens)
+        query_vector = np.zeros(len(self._vocabulary), dtype=np.float64)
+        for term, tf_score in query_tf.items():
+            if term in self._vocab_index:
+                col_idx = self._vocab_index[term]
+                query_vector[col_idx] = tf_score * self._idf.get(term, 0.0)
+        if np.linalg.norm(query_vector) == 0:
+            return []
+        similarities = cosine_similarity_vector(self._tfidf_matrix, query_vector)
+        top_indices = np.argsort(similarities)[::-1][:top_k]
+        results = []
+        query_term_set = set(query_tokens)
+        for idx in top_indices:
+            score = float(similarities[idx])
+            if score <= 0:
+                continue
+            article = self.articles[idx]
+            doc_term_set = set(self._corpus_tokens[idx])
+            matched_terms = sorted(query_term_set & doc_term_set)
+            results.append({
+                "article": article,
+                "score": score,
+                "matched_terms": matched_terms,
+            })
+        return results
+    def get_best_match(self, query: str) -> Optional[dict]:
+        """Return the single best matching article, or None."""
+        results = self.search(query, top_k=1)
+        return results[0] if results else None
+# ---------------------------------------------------------------------------
+# Initialize the search engine (module-level singleton)
+# ---------------------------------------------------------------------------
+search_engine = TFIDFSearchEngine(KNOWLEDGE_BASE)
+# ---------------------------------------------------------------------------
+# Tab 1: Knowledge Search
+# ---------------------------------------------------------------------------
+def _highlight_terms(text: str, terms: list[str]) -> str:
+    """Wrap matched terms in bold markdown markers."""
+    highlighted = text
+    for term in terms:
+        pattern = re.compile(re.escape(term), re.IGNORECASE)
+        highlighted = pattern.sub(f"**{term}**", highlighted)
+    return highlighted
+def perform_search(query: str) -> str:
+    """Execute TF-IDF search and format results as markdown."""
+    if not query or not query.strip():
+        return "*Enter a search query to find relevant knowledge base articles.*"
+    results = search_engine.search(query.strip(), top_k=TOP_K_RESULTS)
+    if not results:
+        return (
+            f"**No results found for:** \"{query}\"\n\n"
+            "No articles in the knowledge base matched your query terms. "
+            "Try using different keywords or broader terms."
+        )
+    output_parts = [
+        f"### Search Results for: \"{query}\"\n",
+        f"Found **{len(results)}** relevant article(s).\n",
+        "---\n",
+    ]
+    for rank, result in enumerate(results, start=1):
+        article = result["article"]
+        score = result["score"]
+        matched = result["matched_terms"]
+        score_bar = _render_score_bar(score)
+        highlighted_content = _highlight_terms(article["content"], matched)
+        matched_display = ", ".join(f"`{t}`" for t in matched) if matched else "N/A"
+        output_parts.append(
+            f"**#{rank}  [{article['id']}] {article['title']}**\n"
+            f"Category: {article['category']}  |  "
+            f"Relevance: {score:.4f} {score_bar}\n"
+            f"Matched terms: {matched_display}\n\n"
+            f"{highlighted_content}\n\n"
+            "---\n"
+        )
+    return "\n".join(output_parts)
+def _render_score_bar(score: float) -> str:
+    """Render a text-based relevance bar using block characters."""
+    filled = int(round(score * 20))
+    filled = min(filled, 20)
+    return "[" + "=" * filled + " " * (20 - filled) + "]"
+# ---------------------------------------------------------------------------
+# Tab 2: AI Q&A
+# ---------------------------------------------------------------------------
+def answer_question(question: str) -> str:
+    """Find the most relevant article and generate a template-based answer with citation."""
+    if not question or not question.strip():
+        return "*Ask a question about NovaCRM to get an answer with source citation.*"
+    result = search_engine.get_best_match(question.strip())
+    if result is None:
+        return (
+            f"**Question:** {question}\n\n"
+            "I could not find a relevant article in the knowledge base to answer "
+            "your question. Try rephrasing with more specific terms related to "
+            "NovaCRM features, billing, API, security, or integrations."
+        )
+    article = result["article"]
+    score = result["score"]
+    matched = result["matched_terms"]
+    # Extract the most relevant sentence(s) from the article as the excerpt
+    excerpt = _extract_relevant_excerpt(article["content"], matched)
+    highlighted_excerpt = _highlight_terms(excerpt, matched)
+    answer_text = _generate_template_answer(question, article, matched)
+    output_parts = [
+        f"**Question:** {question}\n\n",
+        "---\n\n",
+        f"### Answer\n\n{answer_text}\n\n",
+        "---\n\n",
+        f"### Source\n\n",
+        f"**Article:** [{article['id']}] {article['title']}\n\n",
+        f"**Category:** {article['category']}\n\n",
+        f"**Confidence:** {score:.4f}\n\n",
+        f"**Relevant Excerpt:**\n\n> {highlighted_excerpt}\n",
+    ]
+    return "".join(output_parts)
+def _extract_relevant_excerpt(content: str, matched_terms: list[str]) -> str:
+    """Extract the most relevant 1-2 sentences from the article content."""
+    sentences = re.split(r"(?<=[.!?])\s+", content)
+    if not sentences:
+        return content[:300]
+    if not matched_terms:
+        return sentences[0]
+    scored_sentences: list[tuple[int, str]] = []
+    for sentence in sentences:
+        sentence_lower = sentence.lower()
+        match_count = sum(1 for term in matched_terms if term in sentence_lower)
+        scored_sentences.append((match_count, sentence))
+    scored_sentences.sort(key=lambda pair: pair[0], reverse=True)
+    # Take the top 2 sentences by match count
+    top_sentences = scored_sentences[:2]
+    # Re-order by original position in the article
+    top_sentences_text = [s[1] for s in top_sentences]
+    ordered = [s for s in sentences if s in top_sentences_text]
+    return " ".join(ordered) if ordered else sentences[0]
+def _generate_template_answer(
+    question: str, article: dict[str, str], matched_terms: list[str]
+) -> str:
+    """Generate a natural-language answer based on the matched article content.
+    Uses the article content to compose a direct response rather than
+    simply echoing the question back.
+    """
+    category = article["category"]
+    title = article["title"]
+    content = article["content"]
+    # Extract key sentences that address the question
+    sentences = re.split(r"(?<=[.!?])\s+", content)
+    relevant_sentences = []
+    for sentence in sentences:
+        sentence_lower = sentence.lower()
+        if any(term in sentence_lower for term in matched_terms):
+            relevant_sentences.append(sentence)
+    if not relevant_sentences:
+        relevant_sentences = sentences[:3]
+    # Construct the answer
+    answer_body = " ".join(relevant_sentences[:4])
+    intro_templates = {
+        "Onboarding": f"Based on the {title} documentation",
+        "Billing": f"According to the billing documentation on {title}",
+        "API": f"The API documentation ({title}) explains",
+        "Security": f"Per the security documentation in {title}",
+        "Integrations": f"The integration guide for {title} states",
+        "Infrastructure": f"According to the infrastructure documentation ({title})",
+        "Support": f"The support documentation ({title}) addresses this",
+        "Compliance": f"Per the compliance documentation in {title}",
+    }
+    intro = intro_templates.get(category, f"According to {title}")
+    return f"{intro}: {answer_body}"
+# ---------------------------------------------------------------------------
+# Tab 3: Training Generator
+# ---------------------------------------------------------------------------
+# Pre-built quiz data keyed by article ID
+QUIZ_DATA: dict[str, list[dict]] = {}
+def _build_quiz_for_article(article: dict[str, str]) -> list[dict]:
+    """Generate 5 multiple-choice questions from article content.
+    Uses content extraction to create questions that reference actual
+    article details rather than generic filler.
+    """
+    content = article["content"]
+    title = article["title"]
+    sentences = re.split(r"(?<=[.!?])\s+", content)
+    questions: list[dict] = []
+    # Strategy: pull factual statements and create questions about them
+    for i, sentence in enumerate(sentences):
+        if len(questions) >= 5:
+            break
+        # Skip very short sentences
+        if len(sentence) < 30:
+            continue
+        question_entry = _sentence_to_question(sentence, title, i)
+        if question_entry:
+            questions.append(question_entry)
+    # Pad with generic questions if content was too sparse
+    while len(questions) < 5:
+        questions.append({
+            "question": f"What is the primary purpose of {title}?",
+            "options": [
+                f"To manage {article['category'].lower()} features",
+                "To provide general system information",
+                "To configure external services",
+                "To handle user authentication only",
+            ],
+            "correct": 0,
+        })
+    return questions[:5]
+def _sentence_to_question(sentence: str, title: str, seed: int) -> Optional[dict]:
+    """Convert a factual sentence into a multiple-choice question."""
+    # Look for sentences with numbers, specific features, or named items
+    number_match = re.search(r"(\d+[\s\w-]*(?:hours?|days?|minutes?|percent|GB|requests?))", sentence)
+    if number_match:
+        fact = number_match.group(1)
+        return {
+            "question": f"According to \"{title}\", what is the specification for: {fact.strip()}?",
+            "options": [
+                f"The value is {fact.strip()}",
+                f"The value is double the standard amount",
+                "This is not specified in the documentation",
+                "This depends on the subscription tier selected",
+            ],
+            "correct": 0,
+        }
+    # Look for feature mentions
+    feature_patterns = [
+        (r"supports?\s+(.+?)(?:\.|,|$)", "support"),
+        (r"includes?\s+(.+?)(?:\.|,|$)", "include"),
+        (r"provides?\s+(.+?)(?:\.|,|$)", "provide"),
+        (r"enables?\s+(.+?)(?:\.|,|$)", "enable"),
+    ]
+    for pattern, verb in feature_patterns:
+        match = re.search(pattern, sentence, re.IGNORECASE)
+        if match:
+            feature = match.group(1).strip()
+            if len(feature) > 15 and len(feature) < 120:
+                return {
+                    "question": f"What does the system {verb} according to \"{title}\"?",
+                    "options": [
+                        feature[:100],
+                        "Only basic text-based functionality",
+                        "This feature is not available",
+                        "Requires third-party configuration",
+                    ],
+                    "correct": 0,
+                }
+    return None
+def generate_training(topic_title: str) -> str:
+    """Generate a training article outline and quiz for the selected topic."""
+    if not topic_title:
+        return "*Select a topic to generate training material.*"
+    article = None
+    for kb_article in KNOWLEDGE_BASE:
+        if kb_article["title"] == topic_title:
+            article = kb_article
+            break
+    if article is None:
+        return f"Article not found: {topic_title}"
+    # Cache quiz data
+    if article["id"] not in QUIZ_DATA:
+        QUIZ_DATA[article["id"]] = _build_quiz_for_article(article)
+    quiz_questions = QUIZ_DATA[article["id"]]
+    sentences = re.split(r"(?<=[.!?])\s+", article["content"])
+    # Build training article outline
+    output_parts = [
+        f"## Training Module: {article['title']}\n",
+        f"**Category:** {article['category']}  |  "
+        f"**Article ID:** {article['id']}\n\n",
+        "---\n\n",
+        "### Learning Objectives\n\n",
+        f"After completing this module, you will be able to:\n\n",
+    ]
+    # Generate 3 learning objectives from article content
+    objectives = _extract_learning_objectives(sentences)
+    for obj in objectives:
+        output_parts.append(f"- {obj}\n")
+    output_parts.append("\n### Module Outline\n\n")
+    # Split content into sections
+    section_size = max(1, len(sentences) // 3)
+    section_titles = ["Introduction and Overview", "Core Concepts", "Implementation Details"]
+    for section_idx, section_title in enumerate(section_titles):
+        start = section_idx * section_size
+        end = start + section_size if section_idx < 2 else len(sentences)
+        section_content = " ".join(sentences[start:end])
+        if section_content.strip():
+            output_parts.append(f"**{section_idx + 1}. {section_title}**\n\n")
+            output_parts.append(f"{section_content}\n\n")
+    output_parts.append("---\n\n### Knowledge Check (5 Questions)\n\n")
+    for q_idx, q_data in enumerate(quiz_questions, start=1):
+        output_parts.append(f"**Q{q_idx}.** {q_data['question']}\n\n")
+        labels = ["A", "B", "C", "D"]
+        for opt_idx, option in enumerate(q_data["options"]):
+            marker = " (correct)" if opt_idx == q_data["correct"] else ""
+            output_parts.append(f"  {labels[opt_idx]}. {option}{marker}\n")
+        output_parts.append("\n")
+    return "".join(output_parts)
+def _extract_learning_objectives(sentences: list[str]) -> list[str]:
+    """Extract or generate 3 learning objectives from article sentences."""
+    objectives: list[str] = []
+    action_verbs = [
+        "Understand how to", "Explain the process of", "Configure and manage",
+        "Identify the key aspects of", "Apply knowledge about",
+    ]
+    for sentence in sentences:
+        if len(objectives) >= 3:
+            break
+        # Look for sentences describing capabilities or processes
+        if any(kw in sentence.lower() for kw in ["navigate", "configure", "create", "enable", "support"]):
+            # Rephrase as objective
+            clean = sentence.rstrip(".")
+            verb = action_verbs[len(objectives) % len(action_verbs)]
+            objective = f"{verb} {clean[0].lower()}{clean[1:]}"
+            if len(objective) < 200:
+                objectives.append(objective)
+    # Pad if needed
+    while len(objectives) < 3:
+        objectives.append(
+            f"{action_verbs[len(objectives) % len(action_verbs)]} the features described in this module"
+        )
+    return objectives[:3]
+# ---------------------------------------------------------------------------
+# Tab 4: Knowledge Gap Analytics
+# ---------------------------------------------------------------------------
+# Mock analytics data
+UNANSWERED_QUERIES = [
+    "How do I integrate with Salesforce?",
+    "What is the data export format for compliance audits?",
+    "Can I use NovaCRM with a self-hosted email server?",
+    "How to configure IP allowlisting?",
+    "What are the API rate limits for the GraphQL endpoint specifically?",
+    "Does NovaCRM support multi-currency deals?",
+    "How to set up automated lead scoring?",
+    "Can I restrict API access by IP address?",
+    "What is the maximum file attachment size?",
+    "How to configure custom email domains?",
+]
+SEARCH_QUERIES_LOG = [
+    ("api authentication", 342),
+    ("billing invoice", 287),
+    ("sso setup okta", 198),
+    ("import contacts csv", 176),
+    ("webhook configuration", 154),
+    ("slack integration", 143),
+    ("password reset", 312),
+    ("pipeline stages", 131),
+    ("gdpr data deletion", 119),
+    ("mobile app offline", 108),
+    ("two factor authentication", 205),
+    ("email tracking", 167),
+    ("custom reports", 145),
+    ("zapier automation", 98),
+    ("backup schedule", 87),
+]
+def generate_analytics() -> tuple:
+    """Generate all analytics charts and summary text.
+    Returns a tuple of (summary_markdown, articles_by_category_fig,
+    freshness_fig, views_fig, gaps_fig).
+    """
+    summary = _build_analytics_summary()
+    category_fig = _plot_articles_by_category()
+    freshness_fig = _plot_freshness_scores()
+    views_fig = _plot_article_views()
+    gaps_fig = _plot_search_gaps()
+    return summary, category_fig, freshness_fig, views_fig, gaps_fig
+def _build_analytics_summary() -> str:
+    """Build the text summary of knowledge base health."""
+    total_articles = len(KNOWLEDGE_BASE)
+    total_views = sum(a["views"] for a in KNOWLEDGE_BASE)
+    avg_freshness = sum(a["freshness"] for a in KNOWLEDGE_BASE) / total_articles
+    stale_articles = [a for a in KNOWLEDGE_BASE if a["freshness"] < 0.80]
+    categories_covered = len(set(a["category"] for a in KNOWLEDGE_BASE))
+    # Most and least viewed
+    sorted_by_views = sorted(KNOWLEDGE_BASE, key=lambda a: a["views"], reverse=True)
+    most_viewed = sorted_by_views[0]
+    least_viewed = sorted_by_views[-1]
+    return (
+        "### Knowledge Base Health Summary\n\n"
+        f"| Metric | Value |\n"
+        f"|--------|-------|\n"
+        f"| Total articles | {total_articles} |\n"
+        f"| Categories covered | {categories_covered} |\n"
+        f"| Total page views | {total_views:,} |\n"
+        f"| Average freshness score | {avg_freshness:.2f} |\n"
+        f"| Articles needing update (freshness < 0.80) | {len(stale_articles)} |\n"
+        f"| Unanswered search queries | {len(UNANSWERED_QUERIES)} |\n\n"
+        f"**Most viewed:** [{most_viewed['id']}] {most_viewed['title']} "
+        f"({most_viewed['views']:,} views)\n\n"
+        f"**Least viewed:** [{least_viewed['id']}] {least_viewed['title']} "
+        f"({least_viewed['views']:,} views)\n\n"
+        "**Stale articles requiring review:**\n\n"
+        + "\n".join(
+            f"- [{a['id']}] {a['title']} (freshness: {a['freshness']:.2f})"
+            for a in stale_articles
+        )
+    )
+def _apply_dark_style(fig: plt.Figure, ax: plt.Axes) -> None:
+    """Apply consistent dark theme styling to matplotlib figures."""
+    bg_color = "#1a1a2e"
+    text_color = "#e0e0e0"
+    grid_color = "#2a2a4a"
+    fig.patch.set_facecolor(bg_color)
+    ax.set_facecolor(bg_color)
+    ax.tick_params(colors=text_color, which="both")
+    ax.xaxis.label.set_color(text_color)
+    ax.yaxis.label.set_color(text_color)
+    ax.title.set_color(text_color)
+    for spine in ax.spines.values():
+        spine.set_color(grid_color)
+    ax.grid(True, alpha=0.2, color=grid_color)
+def _plot_articles_by_category() -> plt.Figure:
+    """Bar chart of article count per category."""
+    category_counts: dict[str, int] = {}
+    for article in KNOWLEDGE_BASE:
+        cat = article["category"]
+        category_counts[cat] = category_counts.get(cat, 0) + 1
+    categories = sorted(category_counts.keys())
+    counts = [category_counts[c] for c in categories]
+    fig, ax = plt.subplots(figsize=(8, 4))
+    _apply_dark_style(fig, ax)
+    bar_colors = ["#3b82f6", "#6366f1", "#8b5cf6", "#a78bfa",
+                  "#60a5fa", "#818cf8", "#7c3aed", "#4f46e5"]
+    bars = ax.barh(categories, counts, color=bar_colors[:len(categories)], height=0.6)
+    ax.set_xlabel("Number of Articles")
+    ax.set_title("Articles by Category")
+    for bar_item, count in zip(bars, counts):
+        ax.text(
+            bar_item.get_width() + 0.1, bar_item.get_y() + bar_item.get_height() / 2,
+            str(count), va="center", color="#e0e0e0", fontweight="bold",
+        )
+    fig.tight_layout()
+    return fig
+def _plot_freshness_scores() -> plt.Figure:
+    """Horizontal bar chart of article freshness scores, color-coded."""
+    sorted_articles = sorted(KNOWLEDGE_BASE, key=lambda a: a["freshness"])
+    titles = [f"[{a['id']}]" for a in sorted_articles]
+    scores = [a["freshness"] for a in sorted_articles]
+    fig, ax = plt.subplots(figsize=(8, 7))
+    _apply_dark_style(fig, ax)
+    colors = []
+    for score in scores:
+        if score >= 0.90:
+            colors.append("#22c55e")  # green -- fresh
+        elif score >= 0.80:
+            colors.append("#eab308")  # yellow -- aging
+        else:
+            colors.append("#ef4444")  # red -- stale
+    ax.barh(titles, scores, color=colors, height=0.6)
+    ax.set_xlabel("Freshness Score")
+    ax.set_title("Article Freshness Scores")
+    ax.set_xlim(0, 1.0)
+    ax.axvline(x=0.80, color="#ef4444", linestyle="--", alpha=0.5, label="Stale threshold")
+    ax.legend(loc="lower right", facecolor="#1a1a2e", edgecolor="#2a2a4a", labelcolor="#e0e0e0")
+    fig.tight_layout()
+    return fig
+def _plot_article_views() -> plt.Figure:
+    """Bar chart of top 10 articles by view count."""
+    sorted_articles = sorted(KNOWLEDGE_BASE, key=lambda a: a["views"], reverse=True)[:10]
+    titles = [f"[{a['id']}]" for a in sorted_articles]
+    views = [a["views"] for a in sorted_articles]
+    fig, ax = plt.subplots(figsize=(8, 5))
+    _apply_dark_style(fig, ax)
+    gradient_colors = plt.cm.Blues(np.linspace(0.9, 0.4, len(titles)))
+    ax.barh(titles, views, color=gradient_colors, height=0.6)
+    ax.set_xlabel("Page Views")
+    ax.set_title("Top 10 Most Viewed Articles")
+    ax.invert_yaxis()
+    for idx, (title_label, view_count) in enumerate(zip(titles, views)):
+        ax.text(
+            view_count + 50, idx, f"{view_count:,}",
+            va="center", color="#e0e0e0", fontsize=9,
+        )
+    fig.tight_layout()
+    return fig
+def _plot_search_gaps() -> plt.Figure:
+    """Bar chart of top search queries that returned no results or low relevance."""
+    queries = [q for q, _ in SEARCH_QUERIES_LOG[:10]]
+    counts = [c for _, c in SEARCH_QUERIES_LOG[:10]]
+    fig, ax = plt.subplots(figsize=(8, 5))
+    _apply_dark_style(fig, ax)
+    ax.barh(queries, counts, color="#6366f1", height=0.6)
+    ax.set_xlabel("Search Frequency")
+    ax.set_title("Most Frequent Search Queries")
+    ax.invert_yaxis()
+    for idx, count in enumerate(counts):
+        ax.text(
+            count + 3, idx, str(count),
+            va="center", color="#e0e0e0", fontsize=9,
+        )
+    fig.tight_layout()
+    return fig
+# ---------------------------------------------------------------------------
+# Gradio Application
+# ---------------------------------------------------------------------------
+CUSTOM_CSS = """
+.gradio-container {
+    max-width: 1200px !important;
+    margin: 0 auto !important;
+}
+.header-text {
+    text-align: center;
+    margin-bottom: 8px;
+}
+.header-text h1 {
+    font-size: 2em;
+    margin-bottom: 4px;
+}
+.header-text p {
+    opacity: 0.8;
+    font-size: 1.05em;
+}
+footer {
+    text-align: center;
+    opacity: 0.6;
+    margin-top: 20px;
+}
+"""
+HEADER_HTML = """
+<div class="header-text">
+    <h1>Vaultwise</h1>
+    <p>Knowledge Management Platform — Document Ingestion, TF-IDF Search, AI Q&A, Training Generation, Analytics</p>
+    <p style="font-size: 0.9em; opacity: 0.6;">
+        This interactive demo runs entirely in-browser with a built-in 30-article knowledge base.
+        All search is powered by a from-scratch TF-IDF implementation — no sklearn, no external NLP libraries.
+    </p>
+</div>
+"""
+FOOTER_HTML = """
+<footer>
+    <p>
+        <a href="https://github.com/dbhavery/vaultwise" target="_blank">GitHub</a>
+        &nbsp;|&nbsp; Built by Don Havery
+    </p>
+</footer>
+"""
+def build_app() -> gr.Blocks:
+    """Construct and return the Gradio Blocks application."""
+    topic_choices = [article["title"] for article in KNOWLEDGE_BASE]
+    with gr.Blocks(
+        title=APP_TITLE,
+        theme=gr.themes.Base(
+            primary_hue=gr.themes.colors.blue,
+            secondary_hue=gr.themes.colors.indigo,
+            neutral_hue=gr.themes.colors.gray,
+            font=gr.themes.GoogleFont("Inter"),
+        ).set(
+            body_background_fill="#0f0f1a",
+            body_background_fill_dark="#0f0f1a",
+            block_background_fill="#1a1a2e",
+            block_background_fill_dark="#1a1a2e",
+            block_border_color="#2a2a4a",
+            block_border_color_dark="#2a2a4a",
+            block_title_text_color="#e0e0e0",
+            block_title_text_color_dark="#e0e0e0",
+            body_text_color="#d0d0d0",
+            body_text_color_dark="#d0d0d0",
+            input_background_fill="#16162a",
+            input_background_fill_dark="#16162a",
+            input_border_color="#2a2a4a",
+            input_border_color_dark="#2a2a4a",
+            button_primary_background_fill="#3b82f6",
+            button_primary_background_fill_dark="#3b82f6",
+            button_primary_text_color="#ffffff",
+            button_primary_text_color_dark="#ffffff",
+        ),
+        css=CUSTOM_CSS,
+    ) as app:
+        gr.HTML(HEADER_HTML)
+        with gr.Tabs():
+            # --- Tab 1: Knowledge Search ---
+            with gr.Tab("Knowledge Search"):
+                gr.Markdown(
+                    "### TF-IDF Vector Search\n"
+                    "Search the NovaCRM knowledge base using term frequency-inverse document "
+                    "frequency scoring with cosine similarity ranking. The engine tokenizes "
+                    "your query, computes TF-IDF weights against all 30 articles, and returns "
+                    "the top 5 matches."
+                )
+                with gr.Row():
+                    with gr.Column(scale=4):
+                        search_input = gr.Textbox(
+                            label="Search Query",
+                            placeholder="e.g., API rate limits authentication, SSO configuration, billing invoice...",
+                            lines=1,
+                        )
+                    with gr.Column(scale=1):
+                        search_btn = gr.Button("Search", variant="primary")
+                search_output = gr.Markdown(
+                    value="*Enter a search query to find relevant knowledge base articles.*",
+                    label="Results",
+                )
+                gr.Examples(
+                    examples=[
+                        ["API authentication rate limits"],
+                        ["how to import contacts from CSV"],
+                        ["SSO single sign-on SAML configuration"],
+                        ["billing subscription pricing plans"],
+                        ["webhook event notifications"],
+                        ["GDPR data erasure compliance"],
+                        ["mobile app offline mode"],
+                        ["workflow automation rules engine"],
+                    ],
+                    inputs=search_input,
+                    label="Example Queries",
+                )
+                search_btn.click(fn=perform_search, inputs=search_input, outputs=search_output)
+                search_input.submit(fn=perform_search, inputs=search_input, outputs=search_output)
+            # --- Tab 2: AI Q&A ---
+            with gr.Tab("AI Q&A"):
+                gr.Markdown(
+                    "### Knowledge-Grounded Question Answering\n"
+                    "Ask a natural language question about NovaCRM. The system finds "
+                    "the most relevant article via TF-IDF search, then generates an "
+                    "answer grounded in the source material with full citation."
+                )
+                with gr.Row():
+                    with gr.Column(scale=4):
+                        qa_input = gr.Textbox(
+                            label="Your Question",
+                            placeholder="e.g., How do I set up two-factor authentication?",
+                            lines=1,
+                        )
+                    with gr.Column(scale=1):
+                        qa_btn = gr.Button("Ask", variant="primary")
+                qa_output = gr.Markdown(
+                    value="*Ask a question about NovaCRM to get an answer with source citation.*",
+                    label="Answer",
+                )
+                gr.Examples(
+                    examples=[
+                        ["How do I reset my password if my account is locked?"],
+                        ["What encryption does NovaCRM use for data at rest?"],
+                        ["How can I connect my Gmail to NovaCRM?"],
+                        ["What are the different subscription plans and pricing?"],
+                        ["How do I configure webhooks for deal updates?"],
+                        ["What compliance certifications does NovaCRM have?"],
+                    ],
+                    inputs=qa_input,
+                    label="Example Questions",
+                )
+                qa_btn.click(fn=answer_question, inputs=qa_input, outputs=qa_output)
+                qa_input.submit(fn=answer_question, inputs=qa_input, outputs=qa_output)
+            # --- Tab 3: Training Generator ---
+            with gr.Tab("Training Generator"):
+                gr.Markdown(
+                    "### Auto-Generated Training Material\n"
+                    "Select a knowledge base article to generate a structured training module "
+                    "with learning objectives, content outline, and a 5-question multiple-choice quiz."
+                )
+                with gr.Row():
+                    with gr.Column(scale=4):
+                        training_dropdown = gr.Dropdown(
+                            choices=topic_choices,
+                            label="Select Article Topic",
+                            value=None,
+                        )
+                    with gr.Column(scale=1):
+                        training_btn = gr.Button("Generate", variant="primary")
+                training_output = gr.Markdown(
+                    value="*Select a topic to generate training material.*",
+                    label="Training Material",
+                )
+                training_btn.click(
+                    fn=generate_training, inputs=training_dropdown, outputs=training_output
+                )
+            # --- Tab 4: Knowledge Gap Analytics ---
+            with gr.Tab("Knowledge Gap Analytics"):
+                gr.Markdown(
+                    "### Knowledge Base Analytics Dashboard\n"
+                    "Health metrics, content freshness, usage patterns, and gap analysis "
+                    "for the knowledge base."
+                )
+                analytics_btn = gr.Button("Generate Analytics Report", variant="primary")
+                analytics_summary = gr.Markdown(label="Summary")
+                with gr.Row():
+                    category_chart = gr.Plot(label="Articles by Category")
+                    views_chart = gr.Plot(label="Most Viewed Articles")
+                with gr.Row():
+                    freshness_chart = gr.Plot(label="Freshness Scores")
+                    gaps_chart = gr.Plot(label="Search Query Frequency")
+                analytics_btn.click(
+                    fn=generate_analytics,
+                    inputs=[],
+                    outputs=[analytics_summary, category_chart, freshness_chart, views_chart, gaps_chart],
+                )
+        gr.HTML(FOOTER_HTML)
+    return app
+# ---------------------------------------------------------------------------
+# Entry point
+# ---------------------------------------------------------------------------
+if __name__ == "__main__":
+    application = build_app()
+    application.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+gradio==5.29.0
+numpy>=1.26.0
+matplotlib>=3.8.0