---
title: QuerySphere
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# QuerySphere: RAG platform for document Q&A with Knowledge Ingestion
[](https://www.python.org/downloads/)
[](https://fastapi.tiangolo.com/)
[](https://opensource.org/licenses/MIT)
> **MVP-Grade RAG Platform with Multi-Format Document Ingestion, Hybrid Retrieval, and Zero API Costs**
---
A MVP grade Retrieval-Augmented Generation (RAG) system that enables organizations to unlock knowledge trapped across documents and archives while maintaining complete data privacy and eliminating costly API dependencies.
---
## 📋 Table of Contents
- [Overview](#-overview)
- [Key Features](#-key-features)
- [System Architecture](#-system-architecture)
- [Technology Stack](#-technology-stack)
- [Installation](#-installation)
- [Quick Start](#-quick-start)
- [Core Components](#-core-components)
- [API Documentation](#-api-documentation)
- [Configuration](#-configuration)
- [RAGAS Evaluation](#-ragas-evaluation)
- [Troubleshooting](#-troubleshooting)
- [License](#-license)
---
## 🎯 Overview
The QuerySphere addresses a critical enterprise pain point: **information silos that cost organizations 20% of employee productivity**. Unlike existing solutions (Humata AI, ChatPDF, NotebookLM) that charge $49/user/month and rely on expensive cloud LLM APIs, this system offers:
### **Core Value Propositions**
| Feature | Traditional Solutions | Our System |
|---------|----------------------|------------|
| **Privacy** | Cloud-based (data leaves premises) | 100% on-premise processing |
| **Cost** | $49-99/user/month + API fees | Zero API costs (local inference) |
| **Input Types** | PDF only | PDF, DOCX, TXT, ZIP archives |
| **Quality Metrics** | Black box (no visibility) | RAGAS evaluation with detailed metrics |
| **Retrieval** | Vector-only | Hybrid (Vector + BM25 + Reranking) |
| **Chunking** | Fixed size | Adaptive (3 strategies) |
### **Market Context**
- **$8.5B** projected enterprise AI search market by 2027
- **85%** of enterprises actively adopting AI-powered knowledge management
- **Growing regulatory demands** for on-premise, privacy-compliant solutions
---
## ✨ Key Features
### **1. Multi-Format Document Ingestion**
- **Supported Formats**: PDF, DOCX, TXT
- **Archive Processing**: ZIP files up to 2GB with recursive extraction
- **Batch Upload**: Process multiple documents simultaneously
- **OCR Support**: Extract text from scanned documents and images (PaddleOCR or EasyOCR)
### **2. Intelligent Document Processing**
- **Adaptive Chunking**: Automatically selects optimal strategy based on document size
- Fixed-size chunks (< 50K tokens): 512 tokens with 50 overlap
- Semantic chunks (50K-500K tokens): Section-aware splitting
- Hierarchical chunks (> 500K tokens): Parent-child structure
- **Metadata Extraction**: Title, author, date, page numbers, section headers
### **3. Hybrid Retrieval System**
```mermaid
graph LR
A[User Query] --> B[Query Embedding]
A --> C[Keyword Analysis]
B --> D[Vector Search
FAISS]
C --> E[BM25 Search]
D --> F[Reciprocal Rank Fusion
60% Vector + 40% BM25]
E --> F
F --> G[Cross-Encoder Reranking]
G --> H[Top-K Results]
```
- **Vector Search**: FAISS with BGE embeddings (384-dim)
- **Keyword Search**: BM25 with optimized parameters (k1=1.5, b=0.75)
- **Fusion Methods**: Weighted, Reciprocal Rank Fusion (RRF), CombSum
- **Reranking**: Cross-encoder for precision boost
### **4. Local LLM Generation**
- **Ollama Integration**: Zero-cost inference with Mistral-7B or LLaMA-2
- **Adaptive Temperature**: Context-aware generation parameters
- **Citation Tracking**: Automatic source attribution with validation
- **Streaming Support**: Token-by-token response generation
### **5. RAGAS Quality Assurance**
- **Real-Time Evaluation**: Answer relevancy, faithfulness, context precision/recall
- **Automatic Metrics**: Computed for every query-response pair
- **Analytics Dashboard**: Track quality trends over time
- **Export Capability**: Download evaluation data for analysis
- **Session Statistics**: Aggregate metrics across conversation sessions
---
## 🗂️ System Architecture
### **High-Level Architecture**
```mermaid
graph TB
subgraph "Frontend Layer"
A[Web UI
HTML/CSS/JS]
end
subgraph "API Layer"
B[FastAPI Gateway
REST Endpoints]
end
subgraph "Ingestion Pipeline"
C[Document Parser
PDF/DOCX/TXT]
D[Adaptive Chunker
3 Strategies]
E[Embedding Generator
BGE-small-en-v1.5]
end
subgraph "Storage Layer"
F[FAISS Vector DB
~10M vectors]
G[BM25 Keyword Index]
H[SQLite Metadata
Documents & Chunks]
I[LRU Cache
Embeddings]
end
subgraph "Retrieval Engine"
J[Hybrid Retriever
Vector + BM25]
K[Cross-Encoder
Reranker]
L[Context Assembler]
end
subgraph "Generation Engine"
M[Ollama LLM
Mistral-7B]
N[Prompt Builder]
O[Citation Formatter]
end
subgraph "Evaluation Engine"
P[RAGAS Evaluator
Quality Metrics]
end
A --> B
B --> C
C --> D
D --> E
E --> F
E --> G
D --> H
E --> I
B --> J
J --> F
J --> G
J --> K
K --> L
L --> N
N --> M
M --> O
O --> A
M --> P
P --> H
P --> A
```
### **Why This Architecture?**
#### **Modular Design**
Each component is independent and replaceable:
- **Parser**: Swap PDF libraries without affecting chunking
- **Embedder**: Change from BGE to OpenAI embeddings with config update
- **LLM**: Switch from Ollama to OpenAI API seamlessly
#### **Separation of Concerns**
```
Ingestion → Storage → Retrieval → Generation → Evaluation
```
Each stage has clear inputs/outputs and single responsibility.
#### **Performance Optimization**
- **Async Processing**: Non-blocking I/O for uploads and LLM calls
- **Batch Operations**: Embed 32 chunks simultaneously
- **Local Caching**: LRU cache for query embeddings and frequent retrievals
- **Indexing**: FAISS ANN for O(log n) search vs O(n) brute force
---
## 🔧 Technology Stack
### **Core Technologies**
| Component | Technology | Version | Why This Choice |
|-----------|-----------|---------|-----------------|
| **Backend** | FastAPI | 0.104+ | Async support, auto-docs, production-grade |
| **LLM** | Ollama (Mistral-7B) | Latest | Zero API costs, on-premise, 20-30 tokens/sec |
| **Embeddings** | BGE-small-en-v1.5 | 384-dim | SOTA quality, 10x faster than alternatives |
| **Vector DB** | FAISS | Latest | Battle-tested, 10x faster than ChromaDB |
| **Keyword Search** | BM25 (rank_bm25) | Latest | Fast probabilistic ranking |
| **Document Parsing** | PyPDF2, python-docx | Latest | Industry standard, reliable |
| **Chunking** | LlamaIndex | 0.9+ | Advanced semantic splitting |
| **Reranking** | Cross-Encoder | Latest | +15% accuracy, minimal latency |
| **Evaluation** | RAGAS | 0.1.9 | Automated RAG quality metrics |
| **Frontend** | Alpine.js | 3.x | Lightweight reactivity, no build step |
| **Database** | SQLite | 3.x | Zero-config, sufficient for metadata |
| **Caching** | In-Memory LRU | Python functools | Fast, no external dependencies |
### **Python Dependencies**
```
fastapi>=0.104.0
uvicorn>=0.24.0
ollama>=0.1.0
sentence-transformers>=2.2.2
faiss-cpu>=1.7.4
llama-index>=0.9.0
rank-bm25>=0.2.2
PyPDF2>=3.0.0
python-docx>=0.8.11
pydantic>=2.0.0
aiohttp>=3.9.0
tiktoken>=0.5.0
ragas==0.1.9
datasets==2.14.6
```
---
## 📦 Installation
### **Prerequisites**
- Python 3.10 or higher
- 8GB RAM minimum (16GB recommended)
- 10GB disk space for models and indexes
- Ollama installed ([https://ollama.ai](https://ollama.ai))
### **Step 1: Clone Repository**
```bash
git clone https://github.com/satyaki-mitra/docu-vault-ai.git
cd docu-vault-ai
```
### **Step 2: Create Virtual Environment**
```bash
# Using conda (recommended)
conda create -n rag_env python=3.10
conda activate rag_env
# Or using venv
python -m venv rag_env
source rag_env/bin/activate # On Windows: rag_env\Scripts\activate
```
### **Step 3: Install Dependencies**
```bash
pip install -r requirements.txt
```
### **Step 4: Install Ollama and Model**
```bash
# Install Ollama (macOS)
brew install ollama
# Install Ollama (Linux)
curl https://ollama.ai/install.sh | sh
# Pull Mistral model
ollama pull mistral:7b
# Verify installation
ollama list
```
### **Step 5: Configure Environment**
```bash
# Copy example config
cp .env.example .env
# Edit configuration (optional)
nano .env
```
**Key Configuration Options:**
```bash
# LLM Settings
OLLAMA_MODEL=mistral:7b
DEFAULT_TEMPERATURE=0.1
CONTEXT_WINDOW=8192
# Retrieval Settings
VECTOR_WEIGHT=0.6
BM25_WEIGHT=0.4
ENABLE_RERANKING=True
TOP_K_RETRIEVE=10
# RAGAS Evaluation
ENABLE_RAGAS=True
RAGAS_ENABLE_GROUND_TRUTH=False
OPENAI_API_KEY=your_openai_api_key_here # Required for RAGAS
# Performance
EMBEDDING_BATCH_SIZE=32
MAX_WORKERS=4
```
---
## 🚀 Quick Start
### **1. Start Ollama Server**
```bash
# Terminal 1: Start Ollama
ollama serve
```
### **2. Launch Application**
```bash
# Terminal 2: Start RAG system
python app.py
```
Output:
```
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000
```
### **3. Access Web Interface**
Open browser to: **http://localhost:8000**
### **4. Upload Documents**
1. Click **"Upload Documents"**
2. Select PDF/DOCX/TXT files (or ZIP archives)
3. Click **"Start Building"**
4. Wait for indexing to complete (progress bar shows status)
### **5. Query Your Documents**
```
Query: "What are the key findings in the Q3 report?"
Response: The Q3 report highlights three key findings:
[1] Revenue increased 23% year-over-year to $45.2M,
[2] Customer acquisition costs decreased 15%, and
[3] Net retention rate reached 118% [1].
Sources:
[1] Q3_Financial_Report.pdf (Page 3, Executive Summary)
RAGAS Metrics:
- Answer Relevancy: 0.89
- Faithfulness: 0.94
- Context Utilization: 0.87
- Overall Score: 0.90
```
---
## 🧩 Core Components
### **1. Document Ingestion Pipeline**
```python
# High-level flow
Document Upload → Parse → Clean → Chunk → Embed → Index
```
**Adaptive Chunking Logic:**
```mermaid
graph TD
A[Calculate Token Count] --> B{Tokens < 50K?}
B -->|Yes| C[Fixed Chunking
512 tokens, 50 overlap]
B -->|No| D{Tokens < 500K?}
D -->|Yes| E[Semantic Chunking
Section-aware]
D -->|No| F[Hierarchical Chunking
Parent 2048, Child 512]
```
### **2. Hybrid Retrieval Engine**
**Retrieval Flow:**
```python
# Pseudocode
def hybrid_retrieve(query: str, top_k: int = 10):
# Dual retrieval
query_embedding = embedder.embed(query)
vector_results = faiss_index.search(query_embedding, top_k * 2)
bm25_results = bm25_index.search(query, top_k * 2)
# Fusion (RRF)
fused_results = reciprocal_rank_fusion(vector_results,
bm25_results,
weights = (0.6, 0.4))
# Reranking
reranked = cross_encoder.rerank(query, fused_results, top_k)
return reranked
```
### **3. Response Generation**
**Temperature Control:**
```mermaid
graph LR
A[Query Type] --> B{Factual?}
B -->|Yes| C[Low Temp
0.1-0.2]
B -->|No| D[Context Quality]
D -->|High| E[Medium Temp
0.3-0.5]
D -->|Low| F[High Temp
0.6-0.8]
```
### **4. RAGAS Evaluation Module**
**Automatic Quality Assessment:**
```python
# After each query-response
ragas_result = ragas_evaluator.evaluate_single(query = user_query,
answer = generated_answer,
contexts = retrieved_chunks,
retrieval_time_ms = retrieval_time,
generation_time_ms = generation_time,
)
# Metrics computed:
- Answer Relevancy (0-1)
- Faithfulness (0-1)
- Context Utilization (0-1)
- Context Relevancy (0-1)
- Overall Score (weighted average)
```
---
## 📚 API Documentation
### **Core Endpoints**
#### **1. Health Check**
```bash
GET /api/health
```
**Response:**
```json
{
"status": "healthy",
"timestamp": "2024-11-27T03:00:00",
"components": {
"vector_store": true,
"llm": true,
"embeddings": true,
"retrieval": true
}
}
```
#### **2. Upload Documents**
```bash
POST /api/upload
Content-Type: multipart/form-data
files: [file1.pdf, file2.docx]
```
#### **3. Start Processing**
```bash
POST /api/start-processing
```
#### **4. Query (Chat)**
```bash
POST /api/chat
Content-Type: application/json
{
"message": "What are the revenue figures?",
"session_id": "session_123"
}
```
**Response includes RAGAS metrics:**
```json
{
"session_id": "session_123",
"response": "Revenue for Q3 was $45.2M [1]...",
"sources": [...],
"metrics": {
"retrieval_time": 245,
"generation_time": 3100,
"total_time": 3350
},
"ragas_metrics": {
"answer_relevancy": 0.89,
"faithfulness": 0.94,
"context_utilization": 0.87,
"context_relevancy": 0.91,
"overall_score": 0.90
}
}
```
#### **5. RAGAS Endpoints**
```bash
# Get evaluation history
GET /api/ragas/history
# Get session statistics
GET /api/ragas/statistics
# Clear evaluation history
POST /api/ragas/clear
# Export evaluation data
GET /api/ragas/export
# Get RAGAS configuration
GET /api/ragas/config
```
---
## ⚙️ Configuration
### **config/settings.py**
**Key Configuration Sections:**
#### **LLM Settings**
```python
OLLAMA_MODEL = "mistral:7b"
DEFAULT_TEMPERATURE = 0.1
MAX_TOKENS = 1000
CONTEXT_WINDOW = 8192
```
#### **RAGAS Settings**
```python
ENABLE_RAGAS = True
RAGAS_ENABLE_GROUND_TRUTH = False
RAGAS_METRICS = ["answer_relevancy",
"faithfulness",
"context_utilization",
"context_relevancy"
]
RAGAS_EVALUATION_TIMEOUT = 60
RAGAS_BATCH_SIZE = 10
```
#### **Caching Settings**
```python
ENABLE_EMBEDDING_CACHE = True
CACHE_MAX_SIZE = 1000 # LRU cache size
CACHE_TTL = 3600 # Time to live in seconds
```
---
## 📊 RAGAS Evaluation
### **What is RAGAS?**
RAGAS (Retrieval-Augmented Generation Assessment) is a framework for evaluating RAG systems using automated metrics. Our implementation provides real-time quality assessment for every query-response pair.
### **Metrics Explained**
| Metric | Definition | Target | Interpretation |
|--------|-----------|--------|----------------|
| **Answer Relevancy** | How well the answer addresses the question | > 0.85 | Measures usefulness to user |
| **Faithfulness** | Is the answer grounded in retrieved context? | > 0.90 | Prevents hallucinations |
| **Context Utilization** | How well the context is used in the answer | > 0.80 | Retrieval effectiveness |
| **Context Relevancy** | Are retrieved chunks relevant to the query? | > 0.85 | Search quality |
| **Overall Score** | Weighted average of all metrics | > 0.85 | System performance |
### **Using the Analytics Dashboard**
1. Navigate to **Analytics & Quality** section
2. View real-time RAGAS metrics table
3. Monitor session statistics (averages, trends)
4. Export evaluation data for offline analysis
### **Example Evaluation Output**
```
Query: "What were the Q3 revenue trends?"
Answer: "Q3 revenue increased 23% YoY to $45.2M..."
RAGAS Evaluation:
├─ Answer Relevancy: 0.89 ✓ (Good)
├─ Faithfulness: 0.94 ✓ (Excellent)
├─ Context Utilization: 0.87 ✓ (Good)
├─ Context Relevancy: 0.91 ✓ (Excellent)
└─ Overall Score: 0.90 ✓ (Excellent)
Performance:
├─ Retrieval Time: 245ms
├─ Generation Time: 3100ms
└─ Total Time: 3345ms
```
---
## 🔧 Troubleshooting
### **Common Issues**
#### **1. "RAGAS evaluation failed"**
**Cause:** OpenAI API key not configured
**Solution:**
```bash
# Add to .env file
OPENAI_API_KEY=your_openai_api_key_here
# Or disable RAGAS if not needed
ENABLE_RAGAS=False
```
#### **2. "Context assembly returning 0 chunks"**
**Cause:** Missing token counts in chunks
**Solution:** Already fixed in `context_assembler.py`. Tokens calculated on-the-fly if missing.
#### **3. "Slow query responses"**
**Solutions:**
- Enable embedding cache : `ENABLE_EMBEDDING_CACHE=True`
- Reduce retrieval count : `TOP_K_RETRIEVE=5`
- Disable reranking : `ENABLE_RERANKING=False`
- Use quantized model for faster inference
#### **4. "RAGAS metrics not appearing"**
**Symptoms:** Chat responses lack quality metrics
**Solution:**
```python
# Verify RAGAS is enabled in settings
ENABLE_RAGAS = True
# Check OpenAI API key is valid
# View logs for RAGAS evaluation errors
tail -f logs/app.log | grep "RAGAS"
```
---
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## 🙏 Acknowledgments
**Open Source Technologies:**
- [FastAPI](https://fastapi.tiangolo.com/) - Modern web framework
- [Ollama](https://ollama.ai/) - Local LLM inference
- [FAISS](https://github.com/facebookresearch/faiss) - Vector similarity search
- [LlamaIndex](https://www.llamaindex.ai/) - Document chunking
- [Sentence Transformers](https://www.sbert.net/) - Embedding models
- [RAGAS](https://github.com/explodinggradients/ragas) - RAG evaluation
**Research Papers:**
- Karpukhin et al. (2020) - Dense Passage Retrieval
- Robertson & Zaragoza (2009) - The Probabilistic Relevance Framework: BM25
- Lewis et al. (2020) - Retrieval-Augmented Generation
- Es et al. (2023) - RAGAS: Automated Evaluation of RAG
---
**Built with ❤️ for the open-source community**