Spaces:
Running
Running
File size: 18,370 Bytes
0a4529c bbfcdfc 0a4529c 69c2ef1 0a4529c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 |
---
title: QuerySphere
emoji: π§
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---
<div align="center">
# QuerySphere: RAG platform for document Q&A with Knowledge Ingestion
[](https://www.python.org/downloads/)
[](https://fastapi.tiangolo.com/)
[](https://opensource.org/licenses/MIT)
> **MVP-Grade RAG Platform with Multi-Format Document Ingestion, Hybrid Retrieval, and Zero API Costs**
</div>
---
A MVP grade Retrieval-Augmented Generation (RAG) system that enables organizations to unlock knowledge trapped across documents and archives while maintaining complete data privacy and eliminating costly API dependencies.
---
## π Table of Contents
- [Overview](#-overview)
- [Key Features](#-key-features)
- [System Architecture](#-system-architecture)
- [Technology Stack](#-technology-stack)
- [Installation](#-installation)
- [Quick Start](#-quick-start)
- [Core Components](#-core-components)
- [API Documentation](#-api-documentation)
- [Configuration](#-configuration)
- [RAGAS Evaluation](#-ragas-evaluation)
- [Troubleshooting](#-troubleshooting)
- [License](#-license)
---
## π― Overview
The QuerySphere addresses a critical enterprise pain point: **information silos that cost organizations 20% of employee productivity**. Unlike existing solutions (Humata AI, ChatPDF, NotebookLM) that charge $49/user/month and rely on expensive cloud LLM APIs, this system offers:
### **Core Value Propositions**
| Feature | Traditional Solutions | Our System |
|---------|----------------------|------------|
| **Privacy** | Cloud-based (data leaves premises) | 100% on-premise processing |
| **Cost** | $49-99/user/month + API fees | Zero API costs (local inference) |
| **Input Types** | PDF only | PDF, DOCX, TXT, ZIP archives |
| **Quality Metrics** | Black box (no visibility) | RAGAS evaluation with detailed metrics |
| **Retrieval** | Vector-only | Hybrid (Vector + BM25 + Reranking) |
| **Chunking** | Fixed size | Adaptive (3 strategies) |
### **Market Context**
- **$8.5B** projected enterprise AI search market by 2027
- **85%** of enterprises actively adopting AI-powered knowledge management
- **Growing regulatory demands** for on-premise, privacy-compliant solutions
---
## β¨ Key Features
### **1. Multi-Format Document Ingestion**
- **Supported Formats**: PDF, DOCX, TXT
- **Archive Processing**: ZIP files up to 2GB with recursive extraction
- **Batch Upload**: Process multiple documents simultaneously
- **OCR Support**: Extract text from scanned documents and images (PaddleOCR or EasyOCR)
### **2. Intelligent Document Processing**
- **Adaptive Chunking**: Automatically selects optimal strategy based on document size
- Fixed-size chunks (< 50K tokens): 512 tokens with 50 overlap
- Semantic chunks (50K-500K tokens): Section-aware splitting
- Hierarchical chunks (> 500K tokens): Parent-child structure
- **Metadata Extraction**: Title, author, date, page numbers, section headers
### **3. Hybrid Retrieval System**
```mermaid
graph LR
A[User Query] --> B[Query Embedding]
A --> C[Keyword Analysis]
B --> D[Vector Search<br/>FAISS]
C --> E[BM25 Search]
D --> F[Reciprocal Rank Fusion<br/>60% Vector + 40% BM25]
E --> F
F --> G[Cross-Encoder Reranking]
G --> H[Top-K Results]
```
- **Vector Search**: FAISS with BGE embeddings (384-dim)
- **Keyword Search**: BM25 with optimized parameters (k1=1.5, b=0.75)
- **Fusion Methods**: Weighted, Reciprocal Rank Fusion (RRF), CombSum
- **Reranking**: Cross-encoder for precision boost
### **4. Local LLM Generation**
- **Ollama Integration**: Zero-cost inference with Mistral-7B or LLaMA-2
- **Adaptive Temperature**: Context-aware generation parameters
- **Citation Tracking**: Automatic source attribution with validation
- **Streaming Support**: Token-by-token response generation
### **5. RAGAS Quality Assurance**
- **Real-Time Evaluation**: Answer relevancy, faithfulness, context precision/recall
- **Automatic Metrics**: Computed for every query-response pair
- **Analytics Dashboard**: Track quality trends over time
- **Export Capability**: Download evaluation data for analysis
- **Session Statistics**: Aggregate metrics across conversation sessions
---
## ποΈ System Architecture
### **High-Level Architecture**
```mermaid
graph TB
subgraph "Frontend Layer"
A[Web UI<br/>HTML/CSS/JS]
end
subgraph "API Layer"
B[FastAPI Gateway<br/>REST Endpoints]
end
subgraph "Ingestion Pipeline"
C[Document Parser<br/>PDF/DOCX/TXT]
D[Adaptive Chunker<br/>3 Strategies]
E[Embedding Generator<br/>BGE-small-en-v1.5]
end
subgraph "Storage Layer"
F[FAISS Vector DB<br/>~10M vectors]
G[BM25 Keyword Index]
H[SQLite Metadata<br/>Documents & Chunks]
I[LRU Cache<br/>Embeddings]
end
subgraph "Retrieval Engine"
J[Hybrid Retriever<br/>Vector + BM25]
K[Cross-Encoder<br/>Reranker]
L[Context Assembler]
end
subgraph "Generation Engine"
M[Ollama LLM<br/>Mistral-7B]
N[Prompt Builder]
O[Citation Formatter]
end
subgraph "Evaluation Engine"
P[RAGAS Evaluator<br/>Quality Metrics]
end
A --> B
B --> C
C --> D
D --> E
E --> F
E --> G
D --> H
E --> I
B --> J
J --> F
J --> G
J --> K
K --> L
L --> N
N --> M
M --> O
O --> A
M --> P
P --> H
P --> A
```
### **Why This Architecture?**
#### **Modular Design**
Each component is independent and replaceable:
- **Parser**: Swap PDF libraries without affecting chunking
- **Embedder**: Change from BGE to OpenAI embeddings with config update
- **LLM**: Switch from Ollama to OpenAI API seamlessly
#### **Separation of Concerns**
```
Ingestion β Storage β Retrieval β Generation β Evaluation
```
Each stage has clear inputs/outputs and single responsibility.
#### **Performance Optimization**
- **Async Processing**: Non-blocking I/O for uploads and LLM calls
- **Batch Operations**: Embed 32 chunks simultaneously
- **Local Caching**: LRU cache for query embeddings and frequent retrievals
- **Indexing**: FAISS ANN for O(log n) search vs O(n) brute force
---
## π§ Technology Stack
### **Core Technologies**
| Component | Technology | Version | Why This Choice |
|-----------|-----------|---------|-----------------|
| **Backend** | FastAPI | 0.104+ | Async support, auto-docs, production-grade |
| **LLM** | Ollama (Mistral-7B) | Latest | Zero API costs, on-premise, 20-30 tokens/sec |
| **Embeddings** | BGE-small-en-v1.5 | 384-dim | SOTA quality, 10x faster than alternatives |
| **Vector DB** | FAISS | Latest | Battle-tested, 10x faster than ChromaDB |
| **Keyword Search** | BM25 (rank_bm25) | Latest | Fast probabilistic ranking |
| **Document Parsing** | PyPDF2, python-docx | Latest | Industry standard, reliable |
| **Chunking** | LlamaIndex | 0.9+ | Advanced semantic splitting |
| **Reranking** | Cross-Encoder | Latest | +15% accuracy, minimal latency |
| **Evaluation** | RAGAS | 0.1.9 | Automated RAG quality metrics |
| **Frontend** | Alpine.js | 3.x | Lightweight reactivity, no build step |
| **Database** | SQLite | 3.x | Zero-config, sufficient for metadata |
| **Caching** | In-Memory LRU | Python functools | Fast, no external dependencies |
### **Python Dependencies**
```
fastapi>=0.104.0
uvicorn>=0.24.0
ollama>=0.1.0
sentence-transformers>=2.2.2
faiss-cpu>=1.7.4
llama-index>=0.9.0
rank-bm25>=0.2.2
PyPDF2>=3.0.0
python-docx>=0.8.11
pydantic>=2.0.0
aiohttp>=3.9.0
tiktoken>=0.5.0
ragas==0.1.9
datasets==2.14.6
```
---
## π¦ Installation
### **Prerequisites**
- Python 3.10 or higher
- 8GB RAM minimum (16GB recommended)
- 10GB disk space for models and indexes
- Ollama installed ([https://ollama.ai](https://ollama.ai))
### **Step 1: Clone Repository**
```bash
git clone https://github.com/satyaki-mitra/docu-vault-ai.git
cd docu-vault-ai
```
### **Step 2: Create Virtual Environment**
```bash
# Using conda (recommended)
conda create -n rag_env python=3.10
conda activate rag_env
# Or using venv
python -m venv rag_env
source rag_env/bin/activate # On Windows: rag_env\Scripts\activate
```
### **Step 3: Install Dependencies**
```bash
pip install -r requirements.txt
```
### **Step 4: Install Ollama and Model**
```bash
# Install Ollama (macOS)
brew install ollama
# Install Ollama (Linux)
curl https://ollama.ai/install.sh | sh
# Pull Mistral model
ollama pull mistral:7b
# Verify installation
ollama list
```
### **Step 5: Configure Environment**
```bash
# Copy example config
cp .env.example .env
# Edit configuration (optional)
nano .env
```
**Key Configuration Options:**
```bash
# LLM Settings
OLLAMA_MODEL=mistral:7b
DEFAULT_TEMPERATURE=0.1
CONTEXT_WINDOW=8192
# Retrieval Settings
VECTOR_WEIGHT=0.6
BM25_WEIGHT=0.4
ENABLE_RERANKING=True
TOP_K_RETRIEVE=10
# RAGAS Evaluation
ENABLE_RAGAS=True
RAGAS_ENABLE_GROUND_TRUTH=False
OPENAI_API_KEY=your_openai_api_key_here # Required for RAGAS
# Performance
EMBEDDING_BATCH_SIZE=32
MAX_WORKERS=4
```
---
## π Quick Start
### **1. Start Ollama Server**
```bash
# Terminal 1: Start Ollama
ollama serve
```
### **2. Launch Application**
```bash
# Terminal 2: Start RAG system
python app.py
```
Output:
```
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000
```
### **3. Access Web Interface**
Open browser to: **http://localhost:8000**
### **4. Upload Documents**
1. Click **"Upload Documents"**
2. Select PDF/DOCX/TXT files (or ZIP archives)
3. Click **"Start Building"**
4. Wait for indexing to complete (progress bar shows status)
### **5. Query Your Documents**
```
Query: "What are the key findings in the Q3 report?"
Response: The Q3 report highlights three key findings:
[1] Revenue increased 23% year-over-year to $45.2M,
[2] Customer acquisition costs decreased 15%, and
[3] Net retention rate reached 118% [1].
Sources:
[1] Q3_Financial_Report.pdf (Page 3, Executive Summary)
RAGAS Metrics:
- Answer Relevancy: 0.89
- Faithfulness: 0.94
- Context Utilization: 0.87
- Overall Score: 0.90
```
---
## π§© Core Components
### **1. Document Ingestion Pipeline**
```python
# High-level flow
Document Upload β Parse β Clean β Chunk β Embed β Index
```
**Adaptive Chunking Logic:**
```mermaid
graph TD
A[Calculate Token Count] --> B{Tokens < 50K?}
B -->|Yes| C[Fixed Chunking<br/>512 tokens, 50 overlap]
B -->|No| D{Tokens < 500K?}
D -->|Yes| E[Semantic Chunking<br/>Section-aware]
D -->|No| F[Hierarchical Chunking<br/>Parent 2048, Child 512]
```
### **2. Hybrid Retrieval Engine**
**Retrieval Flow:**
```python
# Pseudocode
def hybrid_retrieve(query: str, top_k: int = 10):
# Dual retrieval
query_embedding = embedder.embed(query)
vector_results = faiss_index.search(query_embedding, top_k * 2)
bm25_results = bm25_index.search(query, top_k * 2)
# Fusion (RRF)
fused_results = reciprocal_rank_fusion(vector_results,
bm25_results,
weights = (0.6, 0.4))
# Reranking
reranked = cross_encoder.rerank(query, fused_results, top_k)
return reranked
```
### **3. Response Generation**
**Temperature Control:**
```mermaid
graph LR
A[Query Type] --> B{Factual?}
B -->|Yes| C[Low Temp<br/>0.1-0.2]
B -->|No| D[Context Quality]
D -->|High| E[Medium Temp<br/>0.3-0.5]
D -->|Low| F[High Temp<br/>0.6-0.8]
```
### **4. RAGAS Evaluation Module**
**Automatic Quality Assessment:**
```python
# After each query-response
ragas_result = ragas_evaluator.evaluate_single(query = user_query,
answer = generated_answer,
contexts = retrieved_chunks,
retrieval_time_ms = retrieval_time,
generation_time_ms = generation_time,
)
# Metrics computed:
- Answer Relevancy (0-1)
- Faithfulness (0-1)
- Context Utilization (0-1)
- Context Relevancy (0-1)
- Overall Score (weighted average)
```
---
## π API Documentation
### **Core Endpoints**
#### **1. Health Check**
```bash
GET /api/health
```
**Response:**
```json
{
"status": "healthy",
"timestamp": "2024-11-27T03:00:00",
"components": {
"vector_store": true,
"llm": true,
"embeddings": true,
"retrieval": true
}
}
```
#### **2. Upload Documents**
```bash
POST /api/upload
Content-Type: multipart/form-data
files: [file1.pdf, file2.docx]
```
#### **3. Start Processing**
```bash
POST /api/start-processing
```
#### **4. Query (Chat)**
```bash
POST /api/chat
Content-Type: application/json
{
"message": "What are the revenue figures?",
"session_id": "session_123"
}
```
**Response includes RAGAS metrics:**
```json
{
"session_id": "session_123",
"response": "Revenue for Q3 was $45.2M [1]...",
"sources": [...],
"metrics": {
"retrieval_time": 245,
"generation_time": 3100,
"total_time": 3350
},
"ragas_metrics": {
"answer_relevancy": 0.89,
"faithfulness": 0.94,
"context_utilization": 0.87,
"context_relevancy": 0.91,
"overall_score": 0.90
}
}
```
#### **5. RAGAS Endpoints**
```bash
# Get evaluation history
GET /api/ragas/history
# Get session statistics
GET /api/ragas/statistics
# Clear evaluation history
POST /api/ragas/clear
# Export evaluation data
GET /api/ragas/export
# Get RAGAS configuration
GET /api/ragas/config
```
---
## βοΈ Configuration
### **config/settings.py**
**Key Configuration Sections:**
#### **LLM Settings**
```python
OLLAMA_MODEL = "mistral:7b"
DEFAULT_TEMPERATURE = 0.1
MAX_TOKENS = 1000
CONTEXT_WINDOW = 8192
```
#### **RAGAS Settings**
```python
ENABLE_RAGAS = True
RAGAS_ENABLE_GROUND_TRUTH = False
RAGAS_METRICS = ["answer_relevancy",
"faithfulness",
"context_utilization",
"context_relevancy"
]
RAGAS_EVALUATION_TIMEOUT = 60
RAGAS_BATCH_SIZE = 10
```
#### **Caching Settings**
```python
ENABLE_EMBEDDING_CACHE = True
CACHE_MAX_SIZE = 1000 # LRU cache size
CACHE_TTL = 3600 # Time to live in seconds
```
---
## π RAGAS Evaluation
### **What is RAGAS?**
RAGAS (Retrieval-Augmented Generation Assessment) is a framework for evaluating RAG systems using automated metrics. Our implementation provides real-time quality assessment for every query-response pair.
### **Metrics Explained**
| Metric | Definition | Target | Interpretation |
|--------|-----------|--------|----------------|
| **Answer Relevancy** | How well the answer addresses the question | > 0.85 | Measures usefulness to user |
| **Faithfulness** | Is the answer grounded in retrieved context? | > 0.90 | Prevents hallucinations |
| **Context Utilization** | How well the context is used in the answer | > 0.80 | Retrieval effectiveness |
| **Context Relevancy** | Are retrieved chunks relevant to the query? | > 0.85 | Search quality |
| **Overall Score** | Weighted average of all metrics | > 0.85 | System performance |
### **Using the Analytics Dashboard**
1. Navigate to **Analytics & Quality** section
2. View real-time RAGAS metrics table
3. Monitor session statistics (averages, trends)
4. Export evaluation data for offline analysis
### **Example Evaluation Output**
```
Query: "What were the Q3 revenue trends?"
Answer: "Q3 revenue increased 23% YoY to $45.2M..."
RAGAS Evaluation:
ββ Answer Relevancy: 0.89 β (Good)
ββ Faithfulness: 0.94 β (Excellent)
ββ Context Utilization: 0.87 β (Good)
ββ Context Relevancy: 0.91 β (Excellent)
ββ Overall Score: 0.90 β (Excellent)
Performance:
ββ Retrieval Time: 245ms
ββ Generation Time: 3100ms
ββ Total Time: 3345ms
```
---
## π§ Troubleshooting
### **Common Issues**
#### **1. "RAGAS evaluation failed"**
**Cause:** OpenAI API key not configured
**Solution:**
```bash
# Add to .env file
OPENAI_API_KEY=your_openai_api_key_here
# Or disable RAGAS if not needed
ENABLE_RAGAS=False
```
#### **2. "Context assembly returning 0 chunks"**
**Cause:** Missing token counts in chunks
**Solution:** Already fixed in `context_assembler.py`. Tokens calculated on-the-fly if missing.
#### **3. "Slow query responses"**
**Solutions:**
- Enable embedding cache : `ENABLE_EMBEDDING_CACHE=True`
- Reduce retrieval count : `TOP_K_RETRIEVE=5`
- Disable reranking : `ENABLE_RERANKING=False`
- Use quantized model for faster inference
#### **4. "RAGAS metrics not appearing"**
**Symptoms:** Chat responses lack quality metrics
**Solution:**
```python
# Verify RAGAS is enabled in settings
ENABLE_RAGAS = True
# Check OpenAI API key is valid
# View logs for RAGAS evaluation errors
tail -f logs/app.log | grep "RAGAS"
```
---
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## π Acknowledgments
**Open Source Technologies:**
- [FastAPI](https://fastapi.tiangolo.com/) - Modern web framework
- [Ollama](https://ollama.ai/) - Local LLM inference
- [FAISS](https://github.com/facebookresearch/faiss) - Vector similarity search
- [LlamaIndex](https://www.llamaindex.ai/) - Document chunking
- [Sentence Transformers](https://www.sbert.net/) - Embedding models
- [RAGAS](https://github.com/explodinggradients/ragas) - RAG evaluation
**Research Papers:**
- Karpukhin et al. (2020) - Dense Passage Retrieval
- Robertson & Zaragoza (2009) - The Probabilistic Relevance Framework: BM25
- Lewis et al. (2020) - Retrieval-Augmented Generation
- Es et al. (2023) - RAGAS: Automated Evaluation of RAG
---
<div align="center">
**Built with β€οΈ for the open-source community**
</div> |