Spaces:
Running
Running
| title: QuerySphere | |
| emoji: π§ | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| app_port: 7860 | |
| <div align="center"> | |
| # QuerySphere: RAG platform for document Q&A with Knowledge Ingestion | |
| [](https://www.python.org/downloads/) | |
| [](https://fastapi.tiangolo.com/) | |
| [](https://opensource.org/licenses/MIT) | |
| > **MVP-Grade RAG Platform with Multi-Format Document Ingestion, Hybrid Retrieval, and Zero API Costs** | |
| </div> | |
| --- | |
| A MVP grade Retrieval-Augmented Generation (RAG) system that enables organizations to unlock knowledge trapped across documents and archives while maintaining complete data privacy and eliminating costly API dependencies. | |
| --- | |
| ## π Table of Contents | |
| - [Overview](#-overview) | |
| - [Key Features](#-key-features) | |
| - [System Architecture](#-system-architecture) | |
| - [Technology Stack](#-technology-stack) | |
| - [Installation](#-installation) | |
| - [Quick Start](#-quick-start) | |
| - [Core Components](#-core-components) | |
| - [API Documentation](#-api-documentation) | |
| - [Configuration](#-configuration) | |
| - [RAGAS Evaluation](#-ragas-evaluation) | |
| - [Troubleshooting](#-troubleshooting) | |
| - [License](#-license) | |
| --- | |
| ## π― Overview | |
| The QuerySphere addresses a critical enterprise pain point: **information silos that cost organizations 20% of employee productivity**. Unlike existing solutions (Humata AI, ChatPDF, NotebookLM) that charge $49/user/month and rely on expensive cloud LLM APIs, this system offers: | |
| ### **Core Value Propositions** | |
| | Feature | Traditional Solutions | Our System | | |
| |---------|----------------------|------------| | |
| | **Privacy** | Cloud-based (data leaves premises) | 100% on-premise processing | | |
| | **Cost** | $49-99/user/month + API fees | Zero API costs (local inference) | | |
| | **Input Types** | PDF only | PDF, DOCX, TXT, ZIP archives | | |
| | **Quality Metrics** | Black box (no visibility) | RAGAS evaluation with detailed metrics | | |
| | **Retrieval** | Vector-only | Hybrid (Vector + BM25 + Reranking) | | |
| | **Chunking** | Fixed size | Adaptive (3 strategies) | | |
| ### **Market Context** | |
| - **$8.5B** projected enterprise AI search market by 2027 | |
| - **85%** of enterprises actively adopting AI-powered knowledge management | |
| - **Growing regulatory demands** for on-premise, privacy-compliant solutions | |
| --- | |
| ## β¨ Key Features | |
| ### **1. Multi-Format Document Ingestion** | |
| - **Supported Formats**: PDF, DOCX, TXT | |
| - **Archive Processing**: ZIP files up to 2GB with recursive extraction | |
| - **Batch Upload**: Process multiple documents simultaneously | |
| - **OCR Support**: Extract text from scanned documents and images (PaddleOCR or EasyOCR) | |
| ### **2. Intelligent Document Processing** | |
| - **Adaptive Chunking**: Automatically selects optimal strategy based on document size | |
| - Fixed-size chunks (< 50K tokens): 512 tokens with 50 overlap | |
| - Semantic chunks (50K-500K tokens): Section-aware splitting | |
| - Hierarchical chunks (> 500K tokens): Parent-child structure | |
| - **Metadata Extraction**: Title, author, date, page numbers, section headers | |
| ### **3. Hybrid Retrieval System** | |
| ```mermaid | |
| graph LR | |
| A[User Query] --> B[Query Embedding] | |
| A --> C[Keyword Analysis] | |
| B --> D[Vector Search<br/>FAISS] | |
| C --> E[BM25 Search] | |
| D --> F[Reciprocal Rank Fusion<br/>60% Vector + 40% BM25] | |
| E --> F | |
| F --> G[Cross-Encoder Reranking] | |
| G --> H[Top-K Results] | |
| ``` | |
| - **Vector Search**: FAISS with BGE embeddings (384-dim) | |
| - **Keyword Search**: BM25 with optimized parameters (k1=1.5, b=0.75) | |
| - **Fusion Methods**: Weighted, Reciprocal Rank Fusion (RRF), CombSum | |
| - **Reranking**: Cross-encoder for precision boost | |
| ### **4. Local LLM Generation** | |
| - **Ollama Integration**: Zero-cost inference with Mistral-7B or LLaMA-2 | |
| - **Adaptive Temperature**: Context-aware generation parameters | |
| - **Citation Tracking**: Automatic source attribution with validation | |
| - **Streaming Support**: Token-by-token response generation | |
| ### **5. RAGAS Quality Assurance** | |
| - **Real-Time Evaluation**: Answer relevancy, faithfulness, context precision/recall | |
| - **Automatic Metrics**: Computed for every query-response pair | |
| - **Analytics Dashboard**: Track quality trends over time | |
| - **Export Capability**: Download evaluation data for analysis | |
| - **Session Statistics**: Aggregate metrics across conversation sessions | |
| --- | |
| ## ποΈ System Architecture | |
| ### **High-Level Architecture** | |
| ```mermaid | |
| graph TB | |
| subgraph "Frontend Layer" | |
| A[Web UI<br/>HTML/CSS/JS] | |
| end | |
| subgraph "API Layer" | |
| B[FastAPI Gateway<br/>REST Endpoints] | |
| end | |
| subgraph "Ingestion Pipeline" | |
| C[Document Parser<br/>PDF/DOCX/TXT] | |
| D[Adaptive Chunker<br/>3 Strategies] | |
| E[Embedding Generator<br/>BGE-small-en-v1.5] | |
| end | |
| subgraph "Storage Layer" | |
| F[FAISS Vector DB<br/>~10M vectors] | |
| G[BM25 Keyword Index] | |
| H[SQLite Metadata<br/>Documents & Chunks] | |
| I[LRU Cache<br/>Embeddings] | |
| end | |
| subgraph "Retrieval Engine" | |
| J[Hybrid Retriever<br/>Vector + BM25] | |
| K[Cross-Encoder<br/>Reranker] | |
| L[Context Assembler] | |
| end | |
| subgraph "Generation Engine" | |
| M[Ollama LLM<br/>Mistral-7B] | |
| N[Prompt Builder] | |
| O[Citation Formatter] | |
| end | |
| subgraph "Evaluation Engine" | |
| P[RAGAS Evaluator<br/>Quality Metrics] | |
| end | |
| A --> B | |
| B --> C | |
| C --> D | |
| D --> E | |
| E --> F | |
| E --> G | |
| D --> H | |
| E --> I | |
| B --> J | |
| J --> F | |
| J --> G | |
| J --> K | |
| K --> L | |
| L --> N | |
| N --> M | |
| M --> O | |
| O --> A | |
| M --> P | |
| P --> H | |
| P --> A | |
| ``` | |
| ### **Why This Architecture?** | |
| #### **Modular Design** | |
| Each component is independent and replaceable: | |
| - **Parser**: Swap PDF libraries without affecting chunking | |
| - **Embedder**: Change from BGE to OpenAI embeddings with config update | |
| - **LLM**: Switch from Ollama to OpenAI API seamlessly | |
| #### **Separation of Concerns** | |
| ``` | |
| Ingestion β Storage β Retrieval β Generation β Evaluation | |
| ``` | |
| Each stage has clear inputs/outputs and single responsibility. | |
| #### **Performance Optimization** | |
| - **Async Processing**: Non-blocking I/O for uploads and LLM calls | |
| - **Batch Operations**: Embed 32 chunks simultaneously | |
| - **Local Caching**: LRU cache for query embeddings and frequent retrievals | |
| - **Indexing**: FAISS ANN for O(log n) search vs O(n) brute force | |
| --- | |
| ## π§ Technology Stack | |
| ### **Core Technologies** | |
| | Component | Technology | Version | Why This Choice | | |
| |-----------|-----------|---------|-----------------| | |
| | **Backend** | FastAPI | 0.104+ | Async support, auto-docs, production-grade | | |
| | **LLM** | Ollama (Mistral-7B) | Latest | Zero API costs, on-premise, 20-30 tokens/sec | | |
| | **Embeddings** | BGE-small-en-v1.5 | 384-dim | SOTA quality, 10x faster than alternatives | | |
| | **Vector DB** | FAISS | Latest | Battle-tested, 10x faster than ChromaDB | | |
| | **Keyword Search** | BM25 (rank_bm25) | Latest | Fast probabilistic ranking | | |
| | **Document Parsing** | PyPDF2, python-docx | Latest | Industry standard, reliable | | |
| | **Chunking** | LlamaIndex | 0.9+ | Advanced semantic splitting | | |
| | **Reranking** | Cross-Encoder | Latest | +15% accuracy, minimal latency | | |
| | **Evaluation** | RAGAS | 0.1.9 | Automated RAG quality metrics | | |
| | **Frontend** | Alpine.js | 3.x | Lightweight reactivity, no build step | | |
| | **Database** | SQLite | 3.x | Zero-config, sufficient for metadata | | |
| | **Caching** | In-Memory LRU | Python functools | Fast, no external dependencies | | |
| ### **Python Dependencies** | |
| ``` | |
| fastapi>=0.104.0 | |
| uvicorn>=0.24.0 | |
| ollama>=0.1.0 | |
| sentence-transformers>=2.2.2 | |
| faiss-cpu>=1.7.4 | |
| llama-index>=0.9.0 | |
| rank-bm25>=0.2.2 | |
| PyPDF2>=3.0.0 | |
| python-docx>=0.8.11 | |
| pydantic>=2.0.0 | |
| aiohttp>=3.9.0 | |
| tiktoken>=0.5.0 | |
| ragas==0.1.9 | |
| datasets==2.14.6 | |
| ``` | |
| --- | |
| ## π¦ Installation | |
| ### **Prerequisites** | |
| - Python 3.10 or higher | |
| - 8GB RAM minimum (16GB recommended) | |
| - 10GB disk space for models and indexes | |
| - Ollama installed ([https://ollama.ai](https://ollama.ai)) | |
| ### **Step 1: Clone Repository** | |
| ```bash | |
| git clone https://github.com/satyaki-mitra/docu-vault-ai.git | |
| cd docu-vault-ai | |
| ``` | |
| ### **Step 2: Create Virtual Environment** | |
| ```bash | |
| # Using conda (recommended) | |
| conda create -n rag_env python=3.10 | |
| conda activate rag_env | |
| # Or using venv | |
| python -m venv rag_env | |
| source rag_env/bin/activate # On Windows: rag_env\Scripts\activate | |
| ``` | |
| ### **Step 3: Install Dependencies** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### **Step 4: Install Ollama and Model** | |
| ```bash | |
| # Install Ollama (macOS) | |
| brew install ollama | |
| # Install Ollama (Linux) | |
| curl https://ollama.ai/install.sh | sh | |
| # Pull Mistral model | |
| ollama pull mistral:7b | |
| # Verify installation | |
| ollama list | |
| ``` | |
| ### **Step 5: Configure Environment** | |
| ```bash | |
| # Copy example config | |
| cp .env.example .env | |
| # Edit configuration (optional) | |
| nano .env | |
| ``` | |
| **Key Configuration Options:** | |
| ```bash | |
| # LLM Settings | |
| OLLAMA_MODEL=mistral:7b | |
| DEFAULT_TEMPERATURE=0.1 | |
| CONTEXT_WINDOW=8192 | |
| # Retrieval Settings | |
| VECTOR_WEIGHT=0.6 | |
| BM25_WEIGHT=0.4 | |
| ENABLE_RERANKING=True | |
| TOP_K_RETRIEVE=10 | |
| # RAGAS Evaluation | |
| ENABLE_RAGAS=True | |
| RAGAS_ENABLE_GROUND_TRUTH=False | |
| OPENAI_API_KEY=your_openai_api_key_here # Required for RAGAS | |
| # Performance | |
| EMBEDDING_BATCH_SIZE=32 | |
| MAX_WORKERS=4 | |
| ``` | |
| --- | |
| ## π Quick Start | |
| ### **1. Start Ollama Server** | |
| ```bash | |
| # Terminal 1: Start Ollama | |
| ollama serve | |
| ``` | |
| ### **2. Launch Application** | |
| ```bash | |
| # Terminal 2: Start RAG system | |
| python app.py | |
| ``` | |
| Output: | |
| ``` | |
| INFO: Started server process [12345] | |
| INFO: Waiting for application startup. | |
| INFO: Application startup complete. | |
| INFO: Uvicorn running on http://0.0.0.0:8000 | |
| ``` | |
| ### **3. Access Web Interface** | |
| Open browser to: **http://localhost:8000** | |
| ### **4. Upload Documents** | |
| 1. Click **"Upload Documents"** | |
| 2. Select PDF/DOCX/TXT files (or ZIP archives) | |
| 3. Click **"Start Building"** | |
| 4. Wait for indexing to complete (progress bar shows status) | |
| ### **5. Query Your Documents** | |
| ``` | |
| Query: "What are the key findings in the Q3 report?" | |
| Response: The Q3 report highlights three key findings: | |
| [1] Revenue increased 23% year-over-year to $45.2M, | |
| [2] Customer acquisition costs decreased 15%, and | |
| [3] Net retention rate reached 118% [1]. | |
| Sources: | |
| [1] Q3_Financial_Report.pdf (Page 3, Executive Summary) | |
| RAGAS Metrics: | |
| - Answer Relevancy: 0.89 | |
| - Faithfulness: 0.94 | |
| - Context Utilization: 0.87 | |
| - Overall Score: 0.90 | |
| ``` | |
| --- | |
| ## π§© Core Components | |
| ### **1. Document Ingestion Pipeline** | |
| ```python | |
| # High-level flow | |
| Document Upload β Parse β Clean β Chunk β Embed β Index | |
| ``` | |
| **Adaptive Chunking Logic:** | |
| ```mermaid | |
| graph TD | |
| A[Calculate Token Count] --> B{Tokens < 50K?} | |
| B -->|Yes| C[Fixed Chunking<br/>512 tokens, 50 overlap] | |
| B -->|No| D{Tokens < 500K?} | |
| D -->|Yes| E[Semantic Chunking<br/>Section-aware] | |
| D -->|No| F[Hierarchical Chunking<br/>Parent 2048, Child 512] | |
| ``` | |
| ### **2. Hybrid Retrieval Engine** | |
| **Retrieval Flow:** | |
| ```python | |
| # Pseudocode | |
| def hybrid_retrieve(query: str, top_k: int = 10): | |
| # Dual retrieval | |
| query_embedding = embedder.embed(query) | |
| vector_results = faiss_index.search(query_embedding, top_k * 2) | |
| bm25_results = bm25_index.search(query, top_k * 2) | |
| # Fusion (RRF) | |
| fused_results = reciprocal_rank_fusion(vector_results, | |
| bm25_results, | |
| weights = (0.6, 0.4)) | |
| # Reranking | |
| reranked = cross_encoder.rerank(query, fused_results, top_k) | |
| return reranked | |
| ``` | |
| ### **3. Response Generation** | |
| **Temperature Control:** | |
| ```mermaid | |
| graph LR | |
| A[Query Type] --> B{Factual?} | |
| B -->|Yes| C[Low Temp<br/>0.1-0.2] | |
| B -->|No| D[Context Quality] | |
| D -->|High| E[Medium Temp<br/>0.3-0.5] | |
| D -->|Low| F[High Temp<br/>0.6-0.8] | |
| ``` | |
| ### **4. RAGAS Evaluation Module** | |
| **Automatic Quality Assessment:** | |
| ```python | |
| # After each query-response | |
| ragas_result = ragas_evaluator.evaluate_single(query = user_query, | |
| answer = generated_answer, | |
| contexts = retrieved_chunks, | |
| retrieval_time_ms = retrieval_time, | |
| generation_time_ms = generation_time, | |
| ) | |
| # Metrics computed: | |
| - Answer Relevancy (0-1) | |
| - Faithfulness (0-1) | |
| - Context Utilization (0-1) | |
| - Context Relevancy (0-1) | |
| - Overall Score (weighted average) | |
| ``` | |
| --- | |
| ## π API Documentation | |
| ### **Core Endpoints** | |
| #### **1. Health Check** | |
| ```bash | |
| GET /api/health | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "status": "healthy", | |
| "timestamp": "2024-11-27T03:00:00", | |
| "components": { | |
| "vector_store": true, | |
| "llm": true, | |
| "embeddings": true, | |
| "retrieval": true | |
| } | |
| } | |
| ``` | |
| #### **2. Upload Documents** | |
| ```bash | |
| POST /api/upload | |
| Content-Type: multipart/form-data | |
| files: [file1.pdf, file2.docx] | |
| ``` | |
| #### **3. Start Processing** | |
| ```bash | |
| POST /api/start-processing | |
| ``` | |
| #### **4. Query (Chat)** | |
| ```bash | |
| POST /api/chat | |
| Content-Type: application/json | |
| { | |
| "message": "What are the revenue figures?", | |
| "session_id": "session_123" | |
| } | |
| ``` | |
| **Response includes RAGAS metrics:** | |
| ```json | |
| { | |
| "session_id": "session_123", | |
| "response": "Revenue for Q3 was $45.2M [1]...", | |
| "sources": [...], | |
| "metrics": { | |
| "retrieval_time": 245, | |
| "generation_time": 3100, | |
| "total_time": 3350 | |
| }, | |
| "ragas_metrics": { | |
| "answer_relevancy": 0.89, | |
| "faithfulness": 0.94, | |
| "context_utilization": 0.87, | |
| "context_relevancy": 0.91, | |
| "overall_score": 0.90 | |
| } | |
| } | |
| ``` | |
| #### **5. RAGAS Endpoints** | |
| ```bash | |
| # Get evaluation history | |
| GET /api/ragas/history | |
| # Get session statistics | |
| GET /api/ragas/statistics | |
| # Clear evaluation history | |
| POST /api/ragas/clear | |
| # Export evaluation data | |
| GET /api/ragas/export | |
| # Get RAGAS configuration | |
| GET /api/ragas/config | |
| ``` | |
| --- | |
| ## βοΈ Configuration | |
| ### **config/settings.py** | |
| **Key Configuration Sections:** | |
| #### **LLM Settings** | |
| ```python | |
| OLLAMA_MODEL = "mistral:7b" | |
| DEFAULT_TEMPERATURE = 0.1 | |
| MAX_TOKENS = 1000 | |
| CONTEXT_WINDOW = 8192 | |
| ``` | |
| #### **RAGAS Settings** | |
| ```python | |
| ENABLE_RAGAS = True | |
| RAGAS_ENABLE_GROUND_TRUTH = False | |
| RAGAS_METRICS = ["answer_relevancy", | |
| "faithfulness", | |
| "context_utilization", | |
| "context_relevancy" | |
| ] | |
| RAGAS_EVALUATION_TIMEOUT = 60 | |
| RAGAS_BATCH_SIZE = 10 | |
| ``` | |
| #### **Caching Settings** | |
| ```python | |
| ENABLE_EMBEDDING_CACHE = True | |
| CACHE_MAX_SIZE = 1000 # LRU cache size | |
| CACHE_TTL = 3600 # Time to live in seconds | |
| ``` | |
| --- | |
| ## π RAGAS Evaluation | |
| ### **What is RAGAS?** | |
| RAGAS (Retrieval-Augmented Generation Assessment) is a framework for evaluating RAG systems using automated metrics. Our implementation provides real-time quality assessment for every query-response pair. | |
| ### **Metrics Explained** | |
| | Metric | Definition | Target | Interpretation | | |
| |--------|-----------|--------|----------------| | |
| | **Answer Relevancy** | How well the answer addresses the question | > 0.85 | Measures usefulness to user | | |
| | **Faithfulness** | Is the answer grounded in retrieved context? | > 0.90 | Prevents hallucinations | | |
| | **Context Utilization** | How well the context is used in the answer | > 0.80 | Retrieval effectiveness | | |
| | **Context Relevancy** | Are retrieved chunks relevant to the query? | > 0.85 | Search quality | | |
| | **Overall Score** | Weighted average of all metrics | > 0.85 | System performance | | |
| ### **Using the Analytics Dashboard** | |
| 1. Navigate to **Analytics & Quality** section | |
| 2. View real-time RAGAS metrics table | |
| 3. Monitor session statistics (averages, trends) | |
| 4. Export evaluation data for offline analysis | |
| ### **Example Evaluation Output** | |
| ``` | |
| Query: "What were the Q3 revenue trends?" | |
| Answer: "Q3 revenue increased 23% YoY to $45.2M..." | |
| RAGAS Evaluation: | |
| ββ Answer Relevancy: 0.89 β (Good) | |
| ββ Faithfulness: 0.94 β (Excellent) | |
| ββ Context Utilization: 0.87 β (Good) | |
| ββ Context Relevancy: 0.91 β (Excellent) | |
| ββ Overall Score: 0.90 β (Excellent) | |
| Performance: | |
| ββ Retrieval Time: 245ms | |
| ββ Generation Time: 3100ms | |
| ββ Total Time: 3345ms | |
| ``` | |
| --- | |
| ## π§ Troubleshooting | |
| ### **Common Issues** | |
| #### **1. "RAGAS evaluation failed"** | |
| **Cause:** OpenAI API key not configured | |
| **Solution:** | |
| ```bash | |
| # Add to .env file | |
| OPENAI_API_KEY=your_openai_api_key_here | |
| # Or disable RAGAS if not needed | |
| ENABLE_RAGAS=False | |
| ``` | |
| #### **2. "Context assembly returning 0 chunks"** | |
| **Cause:** Missing token counts in chunks | |
| **Solution:** Already fixed in `context_assembler.py`. Tokens calculated on-the-fly if missing. | |
| #### **3. "Slow query responses"** | |
| **Solutions:** | |
| - Enable embedding cache : `ENABLE_EMBEDDING_CACHE=True` | |
| - Reduce retrieval count : `TOP_K_RETRIEVE=5` | |
| - Disable reranking : `ENABLE_RERANKING=False` | |
| - Use quantized model for faster inference | |
| #### **4. "RAGAS metrics not appearing"** | |
| **Symptoms:** Chat responses lack quality metrics | |
| **Solution:** | |
| ```python | |
| # Verify RAGAS is enabled in settings | |
| ENABLE_RAGAS = True | |
| # Check OpenAI API key is valid | |
| # View logs for RAGAS evaluation errors | |
| tail -f logs/app.log | grep "RAGAS" | |
| ``` | |
| --- | |
| ## π License | |
| This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |
| --- | |
| ## π Acknowledgments | |
| **Open Source Technologies:** | |
| - [FastAPI](https://fastapi.tiangolo.com/) - Modern web framework | |
| - [Ollama](https://ollama.ai/) - Local LLM inference | |
| - [FAISS](https://github.com/facebookresearch/faiss) - Vector similarity search | |
| - [LlamaIndex](https://www.llamaindex.ai/) - Document chunking | |
| - [Sentence Transformers](https://www.sbert.net/) - Embedding models | |
| - [RAGAS](https://github.com/explodinggradients/ragas) - RAG evaluation | |
| **Research Papers:** | |
| - Karpukhin et al. (2020) - Dense Passage Retrieval | |
| - Robertson & Zaragoza (2009) - The Probabilistic Relevance Framework: BM25 | |
| - Lewis et al. (2020) - Retrieval-Augmented Generation | |
| - Es et al. (2023) - RAGAS: Automated Evaluation of RAG | |
| --- | |
| <div align="center"> | |
| **Built with β€οΈ for the open-source community** | |
| </div> |