DocIQ - User Guide
1. System Overview
DocIQ is an enterprise-grade Retrieval-Augmented Generation (RAG) platform designed to ingest PDF documentation, create searchable vector embeddings, and allow users to query their knowledge base using natural language.
The system features a Smart Caching Layer that learns from previous questions to deliver faster answers over time, and a Role-Based Access Control (RBAC) system to ensure secure management of data.
2. Access & Authentication
DocIQ separates functionality based on user roles. Use the sidebar navigation to switch between modules.
Roles and Permissions
| Feature | General User (Public) | Viewer (Read-Only) | Admin (Full Access) |
|---|---|---|---|
| Query Knowledge Base | β | β | β |
| View Upload Configuration | β | β | β |
| Upload Documents | β | β (Disabled) | β |
| View Strategies Overview | β | β | β |
| View Audit Logs/Cache | β | β | β |
| Delete Topics/Clear DB | β | β (Disabled) | β |
Login Credentials
- Admin: Full control over uploads and database management.
- Viewer: Audit and verification access only.
(Read only login details - user-id: viewer, password: view123)
3. Module: Upload PDFs
Access: Gated (Requires Login)
Purpose: Ingest new knowledge (PDFs) into the system.
Step-by-Step Upload Process
Authentication: Navigate to "Upload PDFs" in the sidebar and log in.
Topic Assignment:
- Select existing topic: Choose from the dropdown to add documents to an existing collection (e.g., "HR_Policies").
- Create new topic: Enter a unique name (no spaces recommended, e.g., "Engineering_Specs_2025") to start a new collection.
File Selection: Drag and drop a PDF file. The system supports text-based PDFs and will utilize OCR (Optical Character Recognition) for scanned images if the checkbox is selected.
Configuration: Chunking Strategies Select the strategy that best fits your document structure:
- Adaptive Chunking (NLTK): Best for dense text. Uses Natural Language Toolkit logic to respect sentence boundaries intelligently.
- Recursive Sentence Chunking: Good balance. Splits text by paragraphs and sentences using Regex logic.
- Sliding Window: Legacy/Simple. Strictly cuts text at fixed character limits.
Tip: For documents with complex legal clauses, use Adaptive Chunking with a larger overlap (e.g., 200 chars) to maintain context.
Execution: Click "Upload and Process".
- Note for Viewers: This button is disabled.
- The system will extract text, tables, and generate vector embeddings. A sample of processed chunks will be displayed upon success.
4. Module: Ask Questions
Access: Public (No Login Required)
Purpose: Query the vectorized data and retrieve answers.
How to Query
- Select Context: Choose the Topic you wish to search.
- Filter (Optional): Select a specific Sub-topic (derived automatically from filenames) to narrow the search scope.
- Search Parameters:
- Relevance Threshold: Adjusts strictness. Lower (0.4) is more inclusive; Higher (0.8) requires exact wording matches.
- Top-k: Number of document chunks to retrieve for the AI to analyze.
- Fuzzy Match: Controls how strictly the system looks for a cached answer before generating a new one.
Understanding the Results
- Generative Answer: An AI-synthesized response based strictly on the provided context.
- Source Citations: The specific PDF filenames used to generate the answer are listed.
- Metrics Dashboard:
- Cosine: Similarity between the context and the answer. (Green > 0.85 is excellent).
- ROUGE scores: Text overlap statistics used to measure hallucinations.
- Feedback Loop: Users can click π (Helpful) or π (Not Helpful). This tags the entry in the cache database to improve future retrieval.
5. Module: DocIQ-Admin
Access: Gated (Requires Login)
Purpose: Database maintenance, auditing, and strategy review.
A. Chunking Strategies Overview
Provides a high-level table view of all documents in the system.
- Use Case: verify consistency in data ingestion.
- Columns: Document Name, Sub-topic, Strategy Used, Total Vector Records.
B. Manage Topics (Vector DB)
Manage the ChromaDB vector collections.
- Delete Entire Topic: Permanently removes a collection and all associated embeddings.
- Delete Sub-topics: Removes specific documents (grouped by sub-topic) while keeping the rest of the topic intact.
- Audit Report: Upon deletion, the system generates a temporary audit table confirming the number of records removed.
- Permissions: Viewer role can see these options but the "Delete" buttons are disabled.
C. Manage Cached DB
Manage the Q&A history and feedback loop.
- Manage Q&A: View a full list of questions asked.
- View metrics (Cosine/ROUGE).
- Manually override feedback (Helpful/Not Helpful).
- Delete specific bad Q&A pairs.
- Filter & Sort: Use sliders to find "low quality" answers (e.g., Cosine < 0.5) for review.
- Clear DB: Options to clear the in-memory summary cache or wipe the entire SQLite history.
- Permissions: Viewer role can filter and view data, but cannot delete rows or clear the database.
6. Module: Knowledge Catalog
Access: Public (No Login Required)
Purpose: Provide a structured overview of all knowledge assets available within the system.
Overview
The Knowledge Catalog module presents a comprehensive catalogue of all available Topics and Sub-topics, along with their associated reference documents and source information. This module serves as the authoritative index for understanding the scope and coverage of the knowledge base.
Each entry includes direct access to the underlying documents, which can be opened by selecting the hyperlink in the Document column. Source details are also provided to ensure transparency, traceability, and content authenticity.
In addition, the module includes example questions for each topic and sub-topic. These questions are intended to guide users in understanding the type of queries supported by the knowledge base and to assist with effective information discovery.
Key Capabilities
- Centralised inventory of Topics and Sub-topics
- Direct access to source documents via embedded links
- Clear visibility of document origins and authoritative sources
- Example question sets to support learning, validation, and AI-assisted querying
7. Best Practices & Troubleshooting
| Issue | Recommendation |
|---|---|
| "Answer not stated in context" | The AI restricts itself to your documents to prevent lying. Try lowering the Relevance Threshold slider to 0.45 or 0.50 to pull in more broad context. |
| PDF Tables not appearing | Ensure the PDF is native (selectable text). While OCR is enabled, complex tables in scanned images are difficult to parse. |
| Upload Fails | Check if the file is password protected. Remove PDF passwords before uploading. |
| Slow Performance | If running on CPU, large "Top-k" values (e.g., > 10) will slow down generation. Keep Top-k between 3 and 5 for optimal speed. |
DocIQ Enterprise Support