Spaces:

aditya-joshi-05
/

Cortex

Running

App Files Files Community

aditya-joshi-05 commited on Apr 10

Commit

8bc84e6

1 Parent(s): 27edbb8

Add GitHub Action to sync with Hugging Face Spaces

Browse files

Files changed (2) hide show

.github/workflows/sync_to_hf.yml +21 -0
README.md +845 -34

.github/workflows/sync_to_hf.yml ADDED Viewed

	@@ -0,0 +1,21 @@

+name: Sync to Hugging Face hub
+on:
+  push:
+    branches: [main]
+  # This allows you to run this workflow manually from the Actions tab
+  workflow_dispatch:
+jobs:
+  sync-to-hub:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          lfs: true
+      - name: Push to hub
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          git push -f https://huggingface.co/spaces/aditya-joshi-05/Cortex main

README.md CHANGED Viewed

@@ -1,53 +1,864 @@
 ---
-title: Cortex
 sdk: docker
-emoji: 📚
 colorFrom: blue
 colorTo: purple
 ---
-# Cortex RAG — Phase 1
-> Production-grade Retrieval-Augmented Generation with dense vector search,
-> semantic chunking, parent-child hierarchy, and streaming generation.
-## Architecture (Phase 1)
 ```
-Documents (PDF/HTML/TXT)
-    │
-    ▼
-DocumentLoader          # pdfplumber / bs4 / plain text
-    │
-    ▼
-SemanticChunker         # sentence-level cosine similarity boundaries
-    │  ├─ child chunk  (~256 tokens)  → embedded & stored in Milvus
-    │  └─ parent chunk (~1024 tokens) → stored alongside; returned to LLM
-    ▼
-Embedder                # BAAI/bge-small-en-v1.5, L2-normalised
-    │
-    ▼
-MilvusStore             # IVF_FLAT index, cosine metric
-    │
-    │   Query
-    │     │
-    │     ▼
-    │   Dense search → top-15 chunks
-    │     │
-    │     ▼
-    │   LLM Generator (Groq / Llama 3.3-70B)
-    │     │ streaming SSE
-    │     ▼
-    │   Streamlit UI (tabbed: Ask / Ingest / System)
 ```
-## Quick start
-### 1. Clone and install
 ```bash
-git clone <repo>
 cd cortex
 python -m venv .venv
 source .venv/bin/activate
 pip install -r requirements.txt
 python -m nltk.downloader punkt

 ---
+title: Cortex RAG
 sdk: docker
+emoji: 🧠
 colorFrom: blue
 colorTo: purple
 ---
+# Cortex RAG — Next-Gen Retrieval-Augmented Generation
+<div align="center">
+**Production-grade RAG system with dense retrieval, semantic chunking, knowledge graph integration, CRAG gating, and multi-provider LLM support.**
+![Python](https://img.shields.io/badge/Python-3.10+-3776ab?logo=python&logoColor=white)
+![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-009688?logo=fastapi&logoColor=white)
+![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker&logoColor=white)
+![License](https://img.shields.io/badge/License-MIT-green)
+</div>
+---
+## 🎯 Overview
+**Cortex** is a production-ready Retrieval-Augmented Generation (RAG) framework that combines:
+- **Dense Vector Search** — Fast, accurate document retrieval using BAAI embeddings (384-dim)
+- **Semantic Chunking** — Intelligent split boundaries based on sentence-level cosine similarity
+- **Parent-Child Chunks** — 256-token child chunks for precision, 1024-token parents for context
+- **Multi-Strategy Retrieval** — Dense search, BM25 hybrid, knowledge graph traversal
+- **CRAG Gating** — Automatic relevance assessment with fallback to web search
+- **Multi-Provider LLM** — Support for Groq, OpenAI, NVIDIA NIM, and custom endpoints
+- **Streaming Responses** — Real-time SSE-based answer generation with inline citations
+- **Knowledge Graphs** — Automatic relation extraction and entity-based retrieval
+- **Caching Layer** — Redis integration for query result caching
+- **Evaluation Framework** — RAGAS-based RAG evaluation metrics
+---
+## 🏗️ Architecture
 ```
+┌─────────────────────────────────────────────────────────────────┐
+│                    Document Ingestion                            │
+├─────────────────────────────────────────────────────────────────┤
+│  PDF/HTML/TXT → DocumentLoader → SemanticChunker                 │
+│                                       ↓                          │
+│                   Child (~256 tokens) + Parent (~1024 tokens)    │
+├─────────────────────────────────────────────────────────────────┤
+│                        Embedding Layer                            │
+├─────────────────────────────────────────────────────────────────┤
+│  BAAI/bge-small-en-v1.5 (384-dim, L2-normalized)                │
+│  → Milvus Store (IVF_FLAT, COSINE metric)                        │
+│  → BM25 Index (keyword search)                                   │
+│  → Knowledge Graph (entities, relations, triples)                │
+├─────────────────────────────────────────────────────────────────┤
+│                       Query Processing                            │
+├─────────────────────────────────────────────────────────────────┤
+│  Dense Search (top-15) → Reranking → CRAG Gate                   │
+│         ↓                                    ↓                    │
+│  High Confidence?                    Low Confidence?             │
+│         ↓                                    ↓                    │
+│    Use KnowledgeBase               ⚠️ Web Search (Tavily)        │
+├──────���──────────────────────────────────────────────────────────┤
+│                    LLM Generation (Streaming)                     │
+├─────────────────────────────────────────────────────────────────┤
+│  Groq Llama 3.3-70B / OpenAI GPT-4o / NVIDIA NIM / Custom        │
+│  Process context → Generate answer → Extract citations           │
+│  Stream via SSE → Client receives real-time response             │
+├─────────────────────────────────────────────────────────────────┤
+│                  Frontend Interfaces                              │
+├─────────────────────────────────────────────────────────────────┤
+│  Streamlit UI (Ask/Ingest/System) | REST API (FastAPI)           │
+└─────────────────────────────────────────────────────────────────┘
 ```
+---
+## ✨ Key Features
+| Feature | Details |
+|---------|---------|
+| 🔍 **Dense Retrieval** | Sub-50ms semantic search via Milvus with 384-dim embeddings |
+| 📚 **Smart Chunking** | Semantic splits + parent-child hierarchy for precision + context |
+| 🧬 **Knowledge Graphs** | Automatic relation extraction (REBEL or LLM-based) |
+| 🚨 **CRAG Gating** | Relevance assessment with web search fallback |
+| 🔗 **Multi-Strategy** | Dense + BM25 keyword + graph traversal combined |
+| 💾 **Redis Cache** | Query result caching with configurable TTL |
+| 🌐 **Multi-Provider LLM** | Groq, OpenAI, NVIDIA NIM, Ollama, custom OpenAI-compatible |
+| 📊 **Evaluation** | RAGAS metrics for answer relevance, faithfulness, context precision |
+| 🎨 **Streaming UI** | Real-time responses with inline citations and source cards |
+| 🐳 **Docker Ready** | Full Docker Compose setup with Milvus, Redis, API, UI |
+---
+## 🚀 Quick Start
+### Prerequisites
+- Python 3.10+
+- Docker & Docker Compose (optional, for containerized setup)
+- GROQ API key (default LLM provider)
+### 1. Clone & Setup
 ```bash
+# Clone repository
+git clone <repo-url>
 cd cortex
+# Create virtual environment
 python -m venv .venv
+source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+# Install dependencies
+pip install -r requirements.txt
+```
+### 2. Environment Configuration
+Create `.env` file in project root:
+```bash
+# LLM Providers
+GROQ_API_KEY=your_groq_api_key
+GROQ_MODEL=llama-3.3-70b-versatile
+GROQ_TEMPERATURE=0.1
+# Optional: Other LLM providers
+OPENAI_API_KEY=your_openai_key
+MISTRAL_API_KEY=your_mistral_key
+NVIDIA_API_KEY=your_nvidia_key
+# Embedding & Storage
+EMBED_MODEL_NAME=BAAI/bge-small-en-v1.5
+EMBED_DEVICE=cpu  # "cuda" if GPU available
+# Milvus Vector Store
+MILVUS_HOST=localhost
+MILVUS_PORT=19530
+MILVUS_COLLECTION=cortex_chunks
+MILVUS_INDEX_TYPE=IVF_FLAT
+# Redis Cache (optional)
+REDIS_URL=redis://localhost:6379
+# Retrieval
+RETRIEVAL_TOP_K=15
+FINAL_TOP_K=5
+# CRAG (Consistency-based Retrieval Augmented Generation)
+CRAG_ENABLED=true
+CRAG_RELEVANCE_THRESHOLD=0.5
+# Knowledge Graph
+GRAPH_ENABLED=true
+GRAPH_EXTRACTOR=llm-filtered  # "rebel", "llm", "rebel-filtered", "llm-filtered"
+GRAPH_MAX_HOPS=2
+# API
+API_HOST=0.0.0.0
+API_PORT=8000
+```
+### 3. Start Services
+**Option A: Docker Compose (Recommended)**
+```bash
+docker-compose up -d
+# API: http://localhost:8000
+# Streamlit UI: http://localhost:8501
+# Milvus: http://localhost:19530
+```
+**Option B: Local Setup**
+Make sure Milvus is running:
+```bash
+# Using Milvus Docker (if not using compose)
+docker run -d -p 19530:19530 -p 9091:9091 milvusdb/milvus:latest
+# Start API
+python -m uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
+# In another terminal, start UI
+streamlit run ui/app.py
+```
+### 4. Ingest Documents
+**Via Streamlit UI:**
+- Open http://localhost:8501
+- Go to "📥 Ingest" tab
+- Upload PDF/HTML/TXT or provide directory path
+**Via REST API:**
+```bash
+curl -X POST "http://localhost:8000/ingest" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "mode": "directory",
+    "path": "/path/to/documents"
+  }'
+```
+### 5. Ask Questions
+**Via Streamlit UI:**
+- Go to "🔍 Ask" tab
+- Type your question
+- Watch streaming response with citations
+**Via REST API:**
+```bash
+curl -X POST "http://localhost:8000/query" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "What is machine learning?",
+    "provider": "groq",
+    "top_k": 5
+  }' | jq .
+```
+**Streaming Response:**
+```bash
+curl -X POST "http://localhost:8000/query/stream" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "Your question here",
+    "provider": "groq"
+  }'
+```
+---
+## 📡 REST API Endpoints
+### Health & Status
+```http
+GET /health
+```
+Returns system health, Milvus status, collection stats.
+```json
+{
+  "status": "healthy",
+  "milvus": {
+    "connected": true,
+    "collection_count": 2500,
+    "index_type": "IVF_FLAT"
+  }
+}
+```
+### Document Ingestion
+```http
+POST /ingest
+Content-Type: application/json
+{
+  "mode": "directory|file|upload",
+  "path": "/path/to/documents",
+  "chunk_size": 256,
+  "overlap": 32
+}
+```
+### Query (Blocking)
+```http
+POST /query
+Content-Type: application/json
+{
+  "query": "Your question",
+  "provider": "groq",
+  "model": "llama-3.3-70b-versatile",
+  "top_k": 5,
+  "crag": true,
+  "graph": true
+}
+```
+**Response:**
+```json
+{
+  "answer": "Answer text with citations [1][2]...",
+  "chunks": [
+    {
+      "id": "chunk_001",
+      "text": "...",
+      "score": 0.87,
+      "source": "document_name.pdf"
+    }
+  ],
+  "citations": [1, 2],
+  "latency_ms": 1245
+}
+```
+### Query (Streaming)
+```http
+POST /query/stream
+Content-Type: application/json
+{
+  "query": "Your question",
+  "provider": "groq"
+}
+```
+**Response:** Server-Sent Events (SSE) stream
+```
+data: {"type": "start"}
+data: {"type": "chunk", "content": "Answer "}
+data: {"type": "chunk", "content": "is "}
+data: {"type": "chunk", "content": "streaming..."}
+data: {"type": "citations", "citations": [1, 2]}
+data: {"type": "end"}
+```
+### Model Information
+```http
+GET /providers
+```
+Lists all available LLM providers and models.
+---
+## 🛠️ Configuration Guide
+### Retrieval Configuration
+```env
+# Chunk sizes (tokens)
+CHUNK_SIZE_TOKENS=256                    # Child chunk size
+PARENT_CHUNK_SIZE_TOKENS=1024            # Parent chunk size
+SEMANTIC_SIMILARITY_THRESHOLD=0.82       # Split boundary threshold
+CHUNK_OVERLAP_TOKENS=32                  # Overlap padding
+# Retrieval settings
+RETRIEVAL_TOP_K=15                       # Candidates before reranking
+FINAL_TOP_K=5                            # Chunks sent to LLM
+```
+### Embedding Configuration
+```env
+EMBED_MODEL_NAME=BAAI/bge-small-en-v1.5  # Model identifier
+EMBED_DIM=384                             # Output dimension
+EMBED_BATCH_SIZE=64                       # Batch size for processing
+EMBED_DEVICE=cpu                          # cpu or cuda
+```
+### Milvus Configuration
+```env
+MILVUS_HOST=localhost
+MILVUS_PORT=19530
+MILVUS_COLLECTION=cortex_chunks
+MILVUS_INDEX_TYPE=IVF_FLAT                # or HNSW for larger corpora
+MILVUS_METRIC_TYPE=COSINE                 # Vector similarity metric
+MILVUS_NLIST=128                          # clustering parameter for IVF
+MILVUS_NPROBE=16                          # search parameter
+```
+### LLM Provider Configuration
+**Groq (Default)**
+```env
+GROQ_API_KEY=your_key
+GROQ_MODEL=llama-3.3-70b-versatile
+GROQ_TEMPERATURE=0.1
+GROQ_MAX_TOKENS=1024
+GROQ_TIMEOUT=30
+```
+**OpenAI**
+```env
+OPENAI_API_KEY=your_key
+```
+**NVIDIA NIM**
+```env
+NVIDIA_API_KEY=your_key
+```
+**Custom/Ollama**
+```env
+CUSTOM_BASE_URL=http://localhost:11434/v1
+CUSTOM_API_KEY=your_key
+```
+### CRAG (Consistency-based Retrieval Augmented Generation)
+```env
+CRAG_ENABLED=true
+CRAG_RELEVANCE_THRESHOLD=0.5             # Grade boundary
+TAVILY_API_KEY=your_tavily_key           # For web search fallback
+```
+The CRAG gate automatically assesses retrieval quality:
+- **High confidence** (score ≥ threshold) → Use knowledge base
+- **Low confidence** (score < threshold) → Augment with web search
+### Knowledge Graph
+```env
+GRAPH_ENABLED=true
+GRAPH_EXTRACTOR=llm-filtered             # rebel|llm|rebel-filtered|llm-filtered
+GRAPH_MAX_HOPS=2                          # Traversal depth
+GRAPH_PATH=/data/storage/knowledge_graph.json
+# Density filtering (for "filtered" extractors)
+DENSITY_TOP_FRACTION=0.30                 # Process top 30% entity-dense chunks
+DENSITY_MIN_ENTITIES=2                    # Minimum entities per chunk
+```
+### Caching
+```env
+REDIS_URL=redis://localhost:6379
+CACHE_TTL_SECONDS=3600                    # 1 hour
+```
+### Evaluation
+```env
+EVAL_DB_PATH=/data/storage/eval.db
+```
+---
+## 📁 Project Structure
+```
+cortex/
+├── api/                          # FastAPI REST endpoints
+│   ├── main.py                   # App initialization, endpoints
+│   └── schemas.py                # Request/response Pydantic models
+│
+├── ingestion/                    # Document processing pipeline
+│   ├── pipeline.py               # Orchestration
+│   ├── document_loader.py        # PDF/HTML/TXT parsing
+│   ├── chunker.py                # Semantic chunking
+│   └── __init__.py
+│
+├── retrieval/                    # Multi-strategy retrieval
+│   ├── orchestrator.py           # Coordinate retrieval strategies
+│   ├── dense.py                  # Milvus vector search
+│   ├── bm25.py                   # Keyword search index
+│   ├── embedder.py               # HuggingFace embedding model
+│   ├── router.py                 # Query routing logic
+│   ├── fusion.py                 # Result fusion & reranking
+│   ├── graph_builder.py          # Build knowledge graphs
+│   ├── graph_retriever.py        # Entity-based retrieval
+│   ├── relation_extractors.py    # REBEL + LLM extractors
+│   ├── cache.py                  # Redis caching wrapper
+│   └── __init__.py
+│
+├── generation/                   # LLM generation & CRAG
+│   ├── generator.py              # Multi-provider LLM wrapper
+│   ├── crag.py                   # CRAG gate logic
+│   └── __init__.py
+│
+├── evaluation/                   # RAG evaluation metrics
+│   ├── ragas_eval.py             # RAGAS evaluator
+│   ├── store.py                  # Evaluation database
+│   └── __init__.py
+│
+├── ui/                           # Streamlit frontend
+│   ├── app.py                    # Main UI
+│   └── static/                   # (Optional) HTML/CSS/JS
+│
+├── data/                         # Data storage
+│   ├── documents/                # Input documents
+│   ├── storage/                  # Persistent storage
+│   │   ├── knowledge_graph.json
+│   │   ├── bm25_index.pkl
+│   │   └── uploads/
+│   └── synthetic_knowledge_items.txt
+│
+├── config.py                     # Configuration & settings
+├── requirements.txt              # Python dependencies
+├── Dockerfile                    # Docker image build
+├── docker-compose.yml            # Multi-container orchestration
+├── test.py                       # Test suite
+└── README.md                     # This file
+```
+---
+## 🐳 Docker & Deployment
+### Docker Compose Quick Deploy
+```bash
+# Start all services
+docker-compose up -d
+# View logs
+docker-compose logs -f api
+# Stop services
+docker-compose down
+```
+**Services:**
+- `milvus` — Vector database (port 19530)
+- `redis` — Caching layer (port 6379)
+- `api` — FastAPI backend (port 8000)
+- `ui` — Streamlit frontend (port 8501)
+### Environment Variables in Compose
+Edit `docker-compose.yml` to customize:
+```yaml
+services:
+  api:
+    environment:
+      - GROQ_API_KEY=${GROQ_API_KEY}
+      - GROQ_MODEL=llama-3.3-70b-versatile
+      - MILVUS_HOST=milvus
+      - REDIS_URL=redis://redis:6379
+      - GRAPH_EXTRACTOR=llm-filtered
+```
+### Production Deployment
+For production, consider:
+1. **Use HNSW index** instead of IVF_FLAT for better recall:
+   ```env
+   MILVUS_INDEX_TYPE=HNSW
+   ```
+2. **Enable caching** for frequently asked questions:
+   ```env
+   REDIS_URL=redis://redis-prod:6379
+   ```
+3. **Use stronger embedding model** for higher quality:
+   ```env
+   EMBED_MODEL_NAME=BAAI/bge-base-en-v1.5  # 768-dim, better quality
+   ```
+4. **Configure CRAG** for reliability:
+   ```env
+   CRAG_ENABLED=true
+   CRAG_RELEVANCE_THRESHOLD=0.6
+   TAVILY_API_KEY=your_key
+   ```
+---
+## 🔄 Workflow Examples
+### Example 1: Legal Document Q&A
+```bash
+# 1. Ingest legal documents
+curl -X POST "http://localhost:8000/ingest" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "mode": "directory",
+    "path": "/data/legal_documents"
+  }'
+# 2. Query with graph enabled for relation extraction
+curl -X POST "http://localhost:8000/query" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "What are the penalties for breach of contract?",
+    "provider": "groq",
+    "graph": true,
+    "crag": true
+  }'
+```
+### Example 2: Research Paper Analysis
+```bash
+# Ingest PDF papers
+python -c "
+from ingestion.pipeline import IngestionPipeline
+from retrieval.embedder import Embedder
+from retrieval.dense import MilvusStore
+embedder = Embedder()
+store = MilvusStore(embedder=embedder)
+pipeline = IngestionPipeline(embedder=embedder, store=store, bm25=None)
+pipeline.ingest('/data/papers', mode='pdf')
+"
+# Query for specific findings
+curl -X POST "http://localhost:8000/query/stream" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "What are the key findings about transformer performance?",
+    "model": "gpt-4o"
+  }'
+```
+### Example 3: Customer Support Bot
+```bash
+# 1. Ingest FAQ and documentation
+# 2. Set up CRAG with relevant threshold
+# 3. Route low-confidence queries to web search
+CRAG_RELEVANCE_THRESHOLD=0.6
+TAVILY_API_KEY=your_key
+```
+---
+## 📊 Advanced Features
+### Knowledge Graph Extraction
+Three modes available:
+| Mode | Backend | Speed | Quality | Cost |
+|------|---------|-------|---------|------|
+| `rebel` | Local REBEL model | Fast | Good | Free |
+| `llm` | LLM (Groq/OpenAI) | Slower | Excellent | $$ |
+| `rebel-filtered` | REBEL + entity filtering | Fast | Good | Free |
+| `llm-filtered` | LLM + entity filtering | Slower | Excellent | $$ |
+Switch via config:
+```env
+GRAPH_EXTRACTOR=llm-filtered
+```
+### CRAG (Consistency-based RAG)
+Automatically:
+1. Evaluates retrieval confidence
+2. Assigns relevance grade (Correct/Partially-Correct/Missing)
+3. Supplements low-confidence with web search via Tavily
+```python
+from generation.crag import CRAGGate
+crag = CRAGGate()
+response = crag.evaluate(query, context, answer)
+# Returns: grade, supplemental_docs
+```
+### Evaluation & Metrics
+RAGAS-based evaluation:
+```python
+from evaluation.ragas_eval import RAGASEvaluator
+from evaluation.store import EvalStore
+evaluator = RAGASEvaluator(store=EvalStore())
+metrics = evaluator.evaluate(query, context, answer)
+# Returns: answer_relevance, faithfulness, context_precision
+```
+### Caching Strategy
+```python
+from retrieval.cache import CachedRetriever
+retriever = CachedRetriever(base_retriever)
+# First call: 1000ms (database query)
+# Second call: 5ms (Redis cache hit, TTL: 1 hour)
+results = retriever.retrieve("machine learning basics")
+```
+---
+## ⚙️ Performance Tuning
+### For Speed
+```env
+# Smaller embedding model
+EMBED_MODEL_NAME=BAAI/bge-small-en-v1.5
+# Smaller chunks
+CHUNK_SIZE_TOKENS=128
+PARENT_CHUNK_SIZE_TOKENS=512
+# Faster index
+MILVUS_INDEX_TYPE=IVF_FLAT
+MILVUS_NPROBE=8  # Lower = faster
+# Enable cache
+REDIS_URL=redis://localhost:6379
+# Fewer LLM tokens
+GROQ_MAX_TOKENS=512
+```
+### For Quality
+```env
+# Larger embedding model
+EMBED_MODEL_NAME=BAAI/bge-base-en-v1.5
+# Optimal chunks
+CHUNK_SIZE_TOKENS=512
+PARENT_CHUNK_SIZE_TOKENS=2048
+# More precise index
+MILVUS_INDEX_TYPE=HNSW
+MILVUS_NPROBE=32
+# Better LLM
+GROQ_MODEL=llama-3.3-70b-versatile
+# Enable CRAG
+CRAG_ENABLED=true
+```
+---
+## 🐛 Troubleshooting
+### Milvus Connection Failed
+```bash
+# Check if Milvus is running
+curl http://localhost:19530/healthz
+# Restart Milvus
+docker-compose restart milvus
+# Verify in settings
+python -c "from config import get_settings; print(get_settings().milvus_host)"
+```
+### Low Retrieval Quality
+1. **Check chunk quality:**
+   ```python
+   from ingestion.chunker import SemanticChunker
+   chunker = SemanticChunker()
+   chunks = chunker.chunk("your document text")
+   print([c.text for c in chunks[:3]])
+   ```
+2. **Verify embeddings:**
+   ```python
+   from retrieval.embedder import Embedder
+   embedder = Embedder()
+   emb = embedder.embed("test query")
+   print(f"Embedding dim: {len(emb)}, sample: {emb[:5]}")
+   ```
+3. **Enable CRAG** for automatic augmentation:
+   ```env
+   CRAG_ENABLED=true
+   ```
+### Slow Response Times
+1. Check cache hit rate
+2. Reduce `MILVUS_NPROBE`
+3. Use streaming endpoint (`/query/stream`)
+4. Enable Redis caching
+### Out of Memory
+```env
+# Reduce batch sizes
+EMBED_BATCH_SIZE=16
+# Reduce chunk sizes
+CHUNK_SIZE_TOKENS=128
+# Switch to CPU if using GPU
+EMBED_DEVICE=cpu
+```
+---
+## 📈 Monitoring & Evaluation
+### Health Check
+```bash
+curl http://localhost:8000/health | jq .
+```
+### Collection Statistics
+```python
+from retrieval.dense import MilvusStore
+from retrieval.embedder import Embedder
+store = MilvusStore(embedder=Embedder())
+stats = store.get_stats()
+print(f"Documents: {stats['collection_count']}")
+```
+### Query Evaluation
+```python
+from evaluation.ragas_eval import RAGASEvaluator
+from evaluation.store import EvalStore
+evaluator = RAGASEvaluator(store=EvalStore(db_path="/data/storage/eval.db"))
+metrics = evaluator.evaluate(query, context, answer)
+print(f"Answer Relevance: {metrics['answer_relevance']:.2f}")
+print(f"Faithfulness: {metrics['faithfulness']:.2f}")
+print(f"Context Precision: {metrics['context_precision']:.2f}")
+```
+---
+## 🤝 Contributing
+Contributions welcome! Areas for enhancement:
+- [ ] Multi-language support
+- [ ] Fine-tuned domain-specific embeddings
+- [ ] Advanced reranking strategies
+- [ ] GraphQL API
+- [ ] Persistent trace logging
+- [ ] A/B testing framework
+---
+## 📝 License
+MIT License — see LICENSE file for details
+---
+## 🔗 Resources
+- [Milvus Documentation](https://milvus.io/docs)
+- [FastAPI Guide](https://fastapi.tiangolo.com/)
+- [RAGAS Evaluation Framework](https://github.com/explorerx3/ragas)
+- [Groq API Reference](https://console.groq.com/docs/api-reference)
+- [CRAG Paper](https://arxiv.org/abs/2401.15884)
+---
+**Questions?** Open an issue on GitHub or check the documentation.
 source .venv/bin/activate
 pip install -r requirements.txt
 python -m nltk.downloader punkt