aditya-joshi-05 commited on
Commit
8bc84e6
Β·
1 Parent(s): 27edbb8

Add GitHub Action to sync with Hugging Face Spaces

Browse files
Files changed (2) hide show
  1. .github/workflows/sync_to_hf.yml +21 -0
  2. README.md +845 -34
.github/workflows/sync_to_hf.yml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face hub
2
+ on:
3
+ push:
4
+ branches: [main]
5
+
6
+ # This allows you to run this workflow manually from the Actions tab
7
+ workflow_dispatch:
8
+
9
+ jobs:
10
+ sync-to-hub:
11
+ runs-on: ubuntu-latest
12
+ steps:
13
+ - uses: actions/checkout@v4
14
+ with:
15
+ fetch-depth: 0
16
+ lfs: true
17
+ - name: Push to hub
18
+ env:
19
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
20
+ run: |
21
+ git push -f https://huggingface.co/spaces/aditya-joshi-05/Cortex main
README.md CHANGED
@@ -1,53 +1,864 @@
1
  ---
2
- title: Cortex
3
  sdk: docker
4
- emoji: πŸ“š
5
  colorFrom: blue
6
  colorTo: purple
7
  ---
8
- # Cortex RAG β€” Phase 1
9
 
10
- > Production-grade Retrieval-Augmented Generation with dense vector search,
11
- > semantic chunking, parent-child hierarchy, and streaming generation.
12
 
13
- ## Architecture (Phase 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ```
16
- Documents (PDF/HTML/TXT)
17
- β”‚
18
- β–Ό
19
- DocumentLoader # pdfplumber / bs4 / plain text
20
- β”‚
21
- β–Ό
22
- SemanticChunker # sentence-level cosine similarity boundaries
23
- β”‚ β”œβ”€ child chunk (~256 tokens) β†’ embedded & stored in Milvus
24
- β”‚ └─ parent chunk (~1024 tokens) β†’ stored alongside; returned to LLM
25
- β–Ό
26
- Embedder # BAAI/bge-small-en-v1.5, L2-normalised
27
- β”‚
28
- β–Ό
29
- MilvusStore # IVF_FLAT index, cosine metric
30
- β”‚
31
- β”‚ Query
32
- β”‚ β”‚
33
- β”‚ β–Ό
34
- β”‚ Dense search β†’ top-15 chunks
35
- β”‚ β”‚
36
- β”‚ β–Ό
37
- β”‚ LLM Generator (Groq / Llama 3.3-70B)
38
- β”‚ β”‚ streaming SSE
39
- β”‚ β–Ό
40
- β”‚ Streamlit UI (tabbed: Ask / Ingest / System)
 
 
 
 
 
 
 
41
  ```
42
 
43
- ## Quick start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
- ### 1. Clone and install
 
 
 
 
46
 
47
  ```bash
48
- git clone <repo>
 
49
  cd cortex
 
 
50
  python -m venv .venv
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  source .venv/bin/activate
52
  pip install -r requirements.txt
53
  python -m nltk.downloader punkt
 
1
  ---
2
+ title: Cortex RAG
3
  sdk: docker
4
+ emoji: 🧠
5
  colorFrom: blue
6
  colorTo: purple
7
  ---
 
8
 
9
+ # Cortex RAG β€” Next-Gen Retrieval-Augmented Generation
 
10
 
11
+ <div align="center">
12
+
13
+ **Production-grade RAG system with dense retrieval, semantic chunking, knowledge graph integration, CRAG gating, and multi-provider LLM support.**
14
+
15
+ ![Python](https://img.shields.io/badge/Python-3.10+-3776ab?logo=python&logoColor=white)
16
+ ![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-009688?logo=fastapi&logoColor=white)
17
+ ![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker&logoColor=white)
18
+ ![License](https://img.shields.io/badge/License-MIT-green)
19
+
20
+ </div>
21
+
22
+ ---
23
+
24
+ ## 🎯 Overview
25
+
26
+ **Cortex** is a production-ready Retrieval-Augmented Generation (RAG) framework that combines:
27
+
28
+ - **Dense Vector Search** β€” Fast, accurate document retrieval using BAAI embeddings (384-dim)
29
+ - **Semantic Chunking** β€” Intelligent split boundaries based on sentence-level cosine similarity
30
+ - **Parent-Child Chunks** β€” 256-token child chunks for precision, 1024-token parents for context
31
+ - **Multi-Strategy Retrieval** β€” Dense search, BM25 hybrid, knowledge graph traversal
32
+ - **CRAG Gating** β€” Automatic relevance assessment with fallback to web search
33
+ - **Multi-Provider LLM** β€” Support for Groq, OpenAI, NVIDIA NIM, and custom endpoints
34
+ - **Streaming Responses** β€” Real-time SSE-based answer generation with inline citations
35
+ - **Knowledge Graphs** β€” Automatic relation extraction and entity-based retrieval
36
+ - **Caching Layer** β€” Redis integration for query result caching
37
+ - **Evaluation Framework** β€” RAGAS-based RAG evaluation metrics
38
+
39
+ ---
40
+
41
+ ## πŸ—οΈ Architecture
42
 
43
  ```
44
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
45
+ β”‚ Document Ingestion β”‚
46
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
47
+ β”‚ PDF/HTML/TXT β†’ DocumentLoader β†’ SemanticChunker β”‚
48
+ β”‚ ↓ β”‚
49
+ β”‚ Child (~256 tokens) + Parent (~1024 tokens) β”‚
50
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
51
+ β”‚ Embedding Layer β”‚
52
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
53
+ β”‚ BAAI/bge-small-en-v1.5 (384-dim, L2-normalized) β”‚
54
+ β”‚ β†’ Milvus Store (IVF_FLAT, COSINE metric) β”‚
55
+ β”‚ β†’ BM25 Index (keyword search) β”‚
56
+ β”‚ β†’ Knowledge Graph (entities, relations, triples) β”‚
57
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
58
+ β”‚ Query Processing β”‚
59
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
60
+ β”‚ Dense Search (top-15) β†’ Reranking β†’ CRAG Gate β”‚
61
+ β”‚ ↓ ↓ β”‚
62
+ β”‚ High Confidence? Low Confidence? β”‚
63
+ β”‚ ↓ ↓ β”‚
64
+ β”‚ Use KnowledgeBase ⚠️ Web Search (Tavily) β”‚
65
+ β”œβ”€β”€β”€β”€β”€β”€οΏ½οΏ½οΏ½β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
66
+ β”‚ LLM Generation (Streaming) β”‚
67
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
68
+ β”‚ Groq Llama 3.3-70B / OpenAI GPT-4o / NVIDIA NIM / Custom β”‚
69
+ β”‚ Process context β†’ Generate answer β†’ Extract citations β”‚
70
+ β”‚ Stream via SSE β†’ Client receives real-time response β”‚
71
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
72
+ β”‚ Frontend Interfaces β”‚
73
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
74
+ β”‚ Streamlit UI (Ask/Ingest/System) | REST API (FastAPI) β”‚
75
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
76
  ```
77
 
78
+ ---
79
+
80
+ ## ✨ Key Features
81
+
82
+ | Feature | Details |
83
+ |---------|---------|
84
+ | πŸ” **Dense Retrieval** | Sub-50ms semantic search via Milvus with 384-dim embeddings |
85
+ | πŸ“š **Smart Chunking** | Semantic splits + parent-child hierarchy for precision + context |
86
+ | 🧬 **Knowledge Graphs** | Automatic relation extraction (REBEL or LLM-based) |
87
+ | 🚨 **CRAG Gating** | Relevance assessment with web search fallback |
88
+ | πŸ”— **Multi-Strategy** | Dense + BM25 keyword + graph traversal combined |
89
+ | πŸ’Ύ **Redis Cache** | Query result caching with configurable TTL |
90
+ | 🌐 **Multi-Provider LLM** | Groq, OpenAI, NVIDIA NIM, Ollama, custom OpenAI-compatible |
91
+ | πŸ“Š **Evaluation** | RAGAS metrics for answer relevance, faithfulness, context precision |
92
+ | 🎨 **Streaming UI** | Real-time responses with inline citations and source cards |
93
+ | 🐳 **Docker Ready** | Full Docker Compose setup with Milvus, Redis, API, UI |
94
+
95
+ ---
96
+
97
+ ## πŸš€ Quick Start
98
+
99
+ ### Prerequisites
100
 
101
+ - Python 3.10+
102
+ - Docker & Docker Compose (optional, for containerized setup)
103
+ - GROQ API key (default LLM provider)
104
+
105
+ ### 1. Clone & Setup
106
 
107
  ```bash
108
+ # Clone repository
109
+ git clone <repo-url>
110
  cd cortex
111
+
112
+ # Create virtual environment
113
  python -m venv .venv
114
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
115
+
116
+ # Install dependencies
117
+ pip install -r requirements.txt
118
+ ```
119
+
120
+ ### 2. Environment Configuration
121
+
122
+ Create `.env` file in project root:
123
+
124
+ ```bash
125
+ # LLM Providers
126
+ GROQ_API_KEY=your_groq_api_key
127
+ GROQ_MODEL=llama-3.3-70b-versatile
128
+ GROQ_TEMPERATURE=0.1
129
+
130
+ # Optional: Other LLM providers
131
+ OPENAI_API_KEY=your_openai_key
132
+ MISTRAL_API_KEY=your_mistral_key
133
+ NVIDIA_API_KEY=your_nvidia_key
134
+
135
+ # Embedding & Storage
136
+ EMBED_MODEL_NAME=BAAI/bge-small-en-v1.5
137
+ EMBED_DEVICE=cpu # "cuda" if GPU available
138
+
139
+ # Milvus Vector Store
140
+ MILVUS_HOST=localhost
141
+ MILVUS_PORT=19530
142
+ MILVUS_COLLECTION=cortex_chunks
143
+ MILVUS_INDEX_TYPE=IVF_FLAT
144
+
145
+ # Redis Cache (optional)
146
+ REDIS_URL=redis://localhost:6379
147
+
148
+ # Retrieval
149
+ RETRIEVAL_TOP_K=15
150
+ FINAL_TOP_K=5
151
+
152
+ # CRAG (Consistency-based Retrieval Augmented Generation)
153
+ CRAG_ENABLED=true
154
+ CRAG_RELEVANCE_THRESHOLD=0.5
155
+
156
+ # Knowledge Graph
157
+ GRAPH_ENABLED=true
158
+ GRAPH_EXTRACTOR=llm-filtered # "rebel", "llm", "rebel-filtered", "llm-filtered"
159
+ GRAPH_MAX_HOPS=2
160
+
161
+ # API
162
+ API_HOST=0.0.0.0
163
+ API_PORT=8000
164
+ ```
165
+
166
+ ### 3. Start Services
167
+
168
+ **Option A: Docker Compose (Recommended)**
169
+
170
+ ```bash
171
+ docker-compose up -d
172
+ # API: http://localhost:8000
173
+ # Streamlit UI: http://localhost:8501
174
+ # Milvus: http://localhost:19530
175
+ ```
176
+
177
+ **Option B: Local Setup**
178
+
179
+ Make sure Milvus is running:
180
+
181
+ ```bash
182
+ # Using Milvus Docker (if not using compose)
183
+ docker run -d -p 19530:19530 -p 9091:9091 milvusdb/milvus:latest
184
+
185
+ # Start API
186
+ python -m uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
187
+
188
+ # In another terminal, start UI
189
+ streamlit run ui/app.py
190
+ ```
191
+
192
+ ### 4. Ingest Documents
193
+
194
+ **Via Streamlit UI:**
195
+ - Open http://localhost:8501
196
+ - Go to "πŸ“₯ Ingest" tab
197
+ - Upload PDF/HTML/TXT or provide directory path
198
+
199
+ **Via REST API:**
200
+
201
+ ```bash
202
+ curl -X POST "http://localhost:8000/ingest" \
203
+ -H "Content-Type: application/json" \
204
+ -d '{
205
+ "mode": "directory",
206
+ "path": "/path/to/documents"
207
+ }'
208
+ ```
209
+
210
+ ### 5. Ask Questions
211
+
212
+ **Via Streamlit UI:**
213
+ - Go to "πŸ” Ask" tab
214
+ - Type your question
215
+ - Watch streaming response with citations
216
+
217
+ **Via REST API:**
218
+
219
+ ```bash
220
+ curl -X POST "http://localhost:8000/query" \
221
+ -H "Content-Type: application/json" \
222
+ -d '{
223
+ "query": "What is machine learning?",
224
+ "provider": "groq",
225
+ "top_k": 5
226
+ }' | jq .
227
+ ```
228
+
229
+ **Streaming Response:**
230
+
231
+ ```bash
232
+ curl -X POST "http://localhost:8000/query/stream" \
233
+ -H "Content-Type: application/json" \
234
+ -d '{
235
+ "query": "Your question here",
236
+ "provider": "groq"
237
+ }'
238
+ ```
239
+
240
+ ---
241
+
242
+ ## πŸ“‘ REST API Endpoints
243
+
244
+ ### Health & Status
245
+
246
+ ```http
247
+ GET /health
248
+ ```
249
+
250
+ Returns system health, Milvus status, collection stats.
251
+
252
+ ```json
253
+ {
254
+ "status": "healthy",
255
+ "milvus": {
256
+ "connected": true,
257
+ "collection_count": 2500,
258
+ "index_type": "IVF_FLAT"
259
+ }
260
+ }
261
+ ```
262
+
263
+ ### Document Ingestion
264
+
265
+ ```http
266
+ POST /ingest
267
+ Content-Type: application/json
268
+
269
+ {
270
+ "mode": "directory|file|upload",
271
+ "path": "/path/to/documents",
272
+ "chunk_size": 256,
273
+ "overlap": 32
274
+ }
275
+ ```
276
+
277
+ ### Query (Blocking)
278
+
279
+ ```http
280
+ POST /query
281
+ Content-Type: application/json
282
+
283
+ {
284
+ "query": "Your question",
285
+ "provider": "groq",
286
+ "model": "llama-3.3-70b-versatile",
287
+ "top_k": 5,
288
+ "crag": true,
289
+ "graph": true
290
+ }
291
+ ```
292
+
293
+ **Response:**
294
+
295
+ ```json
296
+ {
297
+ "answer": "Answer text with citations [1][2]...",
298
+ "chunks": [
299
+ {
300
+ "id": "chunk_001",
301
+ "text": "...",
302
+ "score": 0.87,
303
+ "source": "document_name.pdf"
304
+ }
305
+ ],
306
+ "citations": [1, 2],
307
+ "latency_ms": 1245
308
+ }
309
+ ```
310
+
311
+ ### Query (Streaming)
312
+
313
+ ```http
314
+ POST /query/stream
315
+ Content-Type: application/json
316
+
317
+ {
318
+ "query": "Your question",
319
+ "provider": "groq"
320
+ }
321
+ ```
322
+
323
+ **Response:** Server-Sent Events (SSE) stream
324
+
325
+ ```
326
+ data: {"type": "start"}
327
+ data: {"type": "chunk", "content": "Answer "}
328
+ data: {"type": "chunk", "content": "is "}
329
+ data: {"type": "chunk", "content": "streaming..."}
330
+ data: {"type": "citations", "citations": [1, 2]}
331
+ data: {"type": "end"}
332
+ ```
333
+
334
+ ### Model Information
335
+
336
+ ```http
337
+ GET /providers
338
+ ```
339
+
340
+ Lists all available LLM providers and models.
341
+
342
+ ---
343
+
344
+ ## πŸ› οΈ Configuration Guide
345
+
346
+ ### Retrieval Configuration
347
+
348
+ ```env
349
+ # Chunk sizes (tokens)
350
+ CHUNK_SIZE_TOKENS=256 # Child chunk size
351
+ PARENT_CHUNK_SIZE_TOKENS=1024 # Parent chunk size
352
+ SEMANTIC_SIMILARITY_THRESHOLD=0.82 # Split boundary threshold
353
+ CHUNK_OVERLAP_TOKENS=32 # Overlap padding
354
+
355
+ # Retrieval settings
356
+ RETRIEVAL_TOP_K=15 # Candidates before reranking
357
+ FINAL_TOP_K=5 # Chunks sent to LLM
358
+ ```
359
+
360
+ ### Embedding Configuration
361
+
362
+ ```env
363
+ EMBED_MODEL_NAME=BAAI/bge-small-en-v1.5 # Model identifier
364
+ EMBED_DIM=384 # Output dimension
365
+ EMBED_BATCH_SIZE=64 # Batch size for processing
366
+ EMBED_DEVICE=cpu # cpu or cuda
367
+ ```
368
+
369
+ ### Milvus Configuration
370
+
371
+ ```env
372
+ MILVUS_HOST=localhost
373
+ MILVUS_PORT=19530
374
+ MILVUS_COLLECTION=cortex_chunks
375
+ MILVUS_INDEX_TYPE=IVF_FLAT # or HNSW for larger corpora
376
+ MILVUS_METRIC_TYPE=COSINE # Vector similarity metric
377
+ MILVUS_NLIST=128 # clustering parameter for IVF
378
+ MILVUS_NPROBE=16 # search parameter
379
+ ```
380
+
381
+ ### LLM Provider Configuration
382
+
383
+ **Groq (Default)**
384
+ ```env
385
+ GROQ_API_KEY=your_key
386
+ GROQ_MODEL=llama-3.3-70b-versatile
387
+ GROQ_TEMPERATURE=0.1
388
+ GROQ_MAX_TOKENS=1024
389
+ GROQ_TIMEOUT=30
390
+ ```
391
+
392
+ **OpenAI**
393
+ ```env
394
+ OPENAI_API_KEY=your_key
395
+ ```
396
+
397
+ **NVIDIA NIM**
398
+ ```env
399
+ NVIDIA_API_KEY=your_key
400
+ ```
401
+
402
+ **Custom/Ollama**
403
+ ```env
404
+ CUSTOM_BASE_URL=http://localhost:11434/v1
405
+ CUSTOM_API_KEY=your_key
406
+ ```
407
+
408
+ ### CRAG (Consistency-based Retrieval Augmented Generation)
409
+
410
+ ```env
411
+ CRAG_ENABLED=true
412
+ CRAG_RELEVANCE_THRESHOLD=0.5 # Grade boundary
413
+ TAVILY_API_KEY=your_tavily_key # For web search fallback
414
+ ```
415
+
416
+ The CRAG gate automatically assesses retrieval quality:
417
+ - **High confidence** (score β‰₯ threshold) β†’ Use knowledge base
418
+ - **Low confidence** (score < threshold) β†’ Augment with web search
419
+
420
+ ### Knowledge Graph
421
+
422
+ ```env
423
+ GRAPH_ENABLED=true
424
+ GRAPH_EXTRACTOR=llm-filtered # rebel|llm|rebel-filtered|llm-filtered
425
+ GRAPH_MAX_HOPS=2 # Traversal depth
426
+ GRAPH_PATH=/data/storage/knowledge_graph.json
427
+
428
+ # Density filtering (for "filtered" extractors)
429
+ DENSITY_TOP_FRACTION=0.30 # Process top 30% entity-dense chunks
430
+ DENSITY_MIN_ENTITIES=2 # Minimum entities per chunk
431
+ ```
432
+
433
+ ### Caching
434
+
435
+ ```env
436
+ REDIS_URL=redis://localhost:6379
437
+ CACHE_TTL_SECONDS=3600 # 1 hour
438
+ ```
439
+
440
+ ### Evaluation
441
+
442
+ ```env
443
+ EVAL_DB_PATH=/data/storage/eval.db
444
+ ```
445
+
446
+ ---
447
+
448
+ ## πŸ“ Project Structure
449
+
450
+ ```
451
+ cortex/
452
+ β”œβ”€β”€ api/ # FastAPI REST endpoints
453
+ β”‚ β”œβ”€β”€ main.py # App initialization, endpoints
454
+ β”‚ └── schemas.py # Request/response Pydantic models
455
+ β”‚
456
+ β”œβ”€β”€ ingestion/ # Document processing pipeline
457
+ β”‚ β”œβ”€β”€ pipeline.py # Orchestration
458
+ β”‚ β”œβ”€β”€ document_loader.py # PDF/HTML/TXT parsing
459
+ β”‚ β”œβ”€β”€ chunker.py # Semantic chunking
460
+ β”‚ └── __init__.py
461
+ β”‚
462
+ β”œβ”€β”€ retrieval/ # Multi-strategy retrieval
463
+ β”‚ β”œβ”€β”€ orchestrator.py # Coordinate retrieval strategies
464
+ β”‚ β”œβ”€β”€ dense.py # Milvus vector search
465
+ β”‚ β”œβ”€β”€ bm25.py # Keyword search index
466
+ β”‚ β”œβ”€β”€ embedder.py # HuggingFace embedding model
467
+ β”‚ β”œβ”€β”€ router.py # Query routing logic
468
+ β”‚ β”œβ”€β”€ fusion.py # Result fusion & reranking
469
+ β”‚ β”œβ”€β”€ graph_builder.py # Build knowledge graphs
470
+ β”‚ β”œβ”€β”€ graph_retriever.py # Entity-based retrieval
471
+ β”‚ β”œβ”€β”€ relation_extractors.py # REBEL + LLM extractors
472
+ β”‚ β”œβ”€β”€ cache.py # Redis caching wrapper
473
+ β”‚ └── __init__.py
474
+ β”‚
475
+ β”œβ”€β”€ generation/ # LLM generation & CRAG
476
+ β”‚ β”œβ”€β”€ generator.py # Multi-provider LLM wrapper
477
+ β”‚ β”œβ”€β”€ crag.py # CRAG gate logic
478
+ β”‚ └── __init__.py
479
+ β”‚
480
+ β”œβ”€β”€ evaluation/ # RAG evaluation metrics
481
+ β”‚ β”œβ”€β”€ ragas_eval.py # RAGAS evaluator
482
+ β”‚ β”œβ”€β”€ store.py # Evaluation database
483
+ β”‚ └── __init__.py
484
+ β”‚
485
+ β”œβ”€β”€ ui/ # Streamlit frontend
486
+ β”‚ β”œβ”€β”€ app.py # Main UI
487
+ β”‚ └── static/ # (Optional) HTML/CSS/JS
488
+ β”‚
489
+ β”œβ”€β”€ data/ # Data storage
490
+ β”‚ β”œβ”€β”€ documents/ # Input documents
491
+ β”‚ β”œβ”€β”€ storage/ # Persistent storage
492
+ β”‚ β”‚ β”œβ”€β”€ knowledge_graph.json
493
+ β”‚ β”‚ β”œβ”€β”€ bm25_index.pkl
494
+ β”‚ β”‚ └── uploads/
495
+ β”‚ └── synthetic_knowledge_items.txt
496
+ β”‚
497
+ β”œβ”€β”€ config.py # Configuration & settings
498
+ β”œβ”€β”€ requirements.txt # Python dependencies
499
+ β”œβ”€β”€ Dockerfile # Docker image build
500
+ β”œβ”€β”€ docker-compose.yml # Multi-container orchestration
501
+ β”œβ”€β”€ test.py # Test suite
502
+ └── README.md # This file
503
+ ```
504
+
505
+ ---
506
+
507
+ ## 🐳 Docker & Deployment
508
+
509
+ ### Docker Compose Quick Deploy
510
+
511
+ ```bash
512
+ # Start all services
513
+ docker-compose up -d
514
+
515
+ # View logs
516
+ docker-compose logs -f api
517
+
518
+ # Stop services
519
+ docker-compose down
520
+ ```
521
+
522
+ **Services:**
523
+ - `milvus` β€” Vector database (port 19530)
524
+ - `redis` β€” Caching layer (port 6379)
525
+ - `api` β€” FastAPI backend (port 8000)
526
+ - `ui` β€” Streamlit frontend (port 8501)
527
+
528
+ ### Environment Variables in Compose
529
+
530
+ Edit `docker-compose.yml` to customize:
531
+
532
+ ```yaml
533
+ services:
534
+ api:
535
+ environment:
536
+ - GROQ_API_KEY=${GROQ_API_KEY}
537
+ - GROQ_MODEL=llama-3.3-70b-versatile
538
+ - MILVUS_HOST=milvus
539
+ - REDIS_URL=redis://redis:6379
540
+ - GRAPH_EXTRACTOR=llm-filtered
541
+ ```
542
+
543
+ ### Production Deployment
544
+
545
+ For production, consider:
546
+
547
+ 1. **Use HNSW index** instead of IVF_FLAT for better recall:
548
+ ```env
549
+ MILVUS_INDEX_TYPE=HNSW
550
+ ```
551
+
552
+ 2. **Enable caching** for frequently asked questions:
553
+ ```env
554
+ REDIS_URL=redis://redis-prod:6379
555
+ ```
556
+
557
+ 3. **Use stronger embedding model** for higher quality:
558
+ ```env
559
+ EMBED_MODEL_NAME=BAAI/bge-base-en-v1.5 # 768-dim, better quality
560
+ ```
561
+
562
+ 4. **Configure CRAG** for reliability:
563
+ ```env
564
+ CRAG_ENABLED=true
565
+ CRAG_RELEVANCE_THRESHOLD=0.6
566
+ TAVILY_API_KEY=your_key
567
+ ```
568
+
569
+ ---
570
+
571
+ ## πŸ”„ Workflow Examples
572
+
573
+ ### Example 1: Legal Document Q&A
574
+
575
+ ```bash
576
+ # 1. Ingest legal documents
577
+ curl -X POST "http://localhost:8000/ingest" \
578
+ -H "Content-Type: application/json" \
579
+ -d '{
580
+ "mode": "directory",
581
+ "path": "/data/legal_documents"
582
+ }'
583
+
584
+ # 2. Query with graph enabled for relation extraction
585
+ curl -X POST "http://localhost:8000/query" \
586
+ -H "Content-Type: application/json" \
587
+ -d '{
588
+ "query": "What are the penalties for breach of contract?",
589
+ "provider": "groq",
590
+ "graph": true,
591
+ "crag": true
592
+ }'
593
+ ```
594
+
595
+ ### Example 2: Research Paper Analysis
596
+
597
+ ```bash
598
+ # Ingest PDF papers
599
+ python -c "
600
+ from ingestion.pipeline import IngestionPipeline
601
+ from retrieval.embedder import Embedder
602
+ from retrieval.dense import MilvusStore
603
+
604
+ embedder = Embedder()
605
+ store = MilvusStore(embedder=embedder)
606
+ pipeline = IngestionPipeline(embedder=embedder, store=store, bm25=None)
607
+
608
+ pipeline.ingest('/data/papers', mode='pdf')
609
+ "
610
+
611
+ # Query for specific findings
612
+ curl -X POST "http://localhost:8000/query/stream" \
613
+ -H "Content-Type: application/json" \
614
+ -d '{
615
+ "query": "What are the key findings about transformer performance?",
616
+ "model": "gpt-4o"
617
+ }'
618
+ ```
619
+
620
+ ### Example 3: Customer Support Bot
621
+
622
+ ```bash
623
+ # 1. Ingest FAQ and documentation
624
+ # 2. Set up CRAG with relevant threshold
625
+ # 3. Route low-confidence queries to web search
626
+
627
+ CRAG_RELEVANCE_THRESHOLD=0.6
628
+ TAVILY_API_KEY=your_key
629
+ ```
630
+
631
+ ---
632
+
633
+ ## πŸ“Š Advanced Features
634
+
635
+ ### Knowledge Graph Extraction
636
+
637
+ Three modes available:
638
+
639
+ | Mode | Backend | Speed | Quality | Cost |
640
+ |------|---------|-------|---------|------|
641
+ | `rebel` | Local REBEL model | Fast | Good | Free |
642
+ | `llm` | LLM (Groq/OpenAI) | Slower | Excellent | $$ |
643
+ | `rebel-filtered` | REBEL + entity filtering | Fast | Good | Free |
644
+ | `llm-filtered` | LLM + entity filtering | Slower | Excellent | $$ |
645
+
646
+ Switch via config:
647
+ ```env
648
+ GRAPH_EXTRACTOR=llm-filtered
649
+ ```
650
+
651
+ ### CRAG (Consistency-based RAG)
652
+
653
+ Automatically:
654
+ 1. Evaluates retrieval confidence
655
+ 2. Assigns relevance grade (Correct/Partially-Correct/Missing)
656
+ 3. Supplements low-confidence with web search via Tavily
657
+
658
+ ```python
659
+ from generation.crag import CRAGGate
660
+
661
+ crag = CRAGGate()
662
+ response = crag.evaluate(query, context, answer)
663
+ # Returns: grade, supplemental_docs
664
+ ```
665
+
666
+ ### Evaluation & Metrics
667
+
668
+ RAGAS-based evaluation:
669
+
670
+ ```python
671
+ from evaluation.ragas_eval import RAGASEvaluator
672
+ from evaluation.store import EvalStore
673
+
674
+ evaluator = RAGASEvaluator(store=EvalStore())
675
+ metrics = evaluator.evaluate(query, context, answer)
676
+ # Returns: answer_relevance, faithfulness, context_precision
677
+ ```
678
+
679
+ ### Caching Strategy
680
+
681
+ ```python
682
+ from retrieval.cache import CachedRetriever
683
+
684
+ retriever = CachedRetriever(base_retriever)
685
+ # First call: 1000ms (database query)
686
+ # Second call: 5ms (Redis cache hit, TTL: 1 hour)
687
+ results = retriever.retrieve("machine learning basics")
688
+ ```
689
+
690
+ ---
691
+
692
+ ## βš™οΈ Performance Tuning
693
+
694
+ ### For Speed
695
+
696
+ ```env
697
+ # Smaller embedding model
698
+ EMBED_MODEL_NAME=BAAI/bge-small-en-v1.5
699
+
700
+ # Smaller chunks
701
+ CHUNK_SIZE_TOKENS=128
702
+ PARENT_CHUNK_SIZE_TOKENS=512
703
+
704
+ # Faster index
705
+ MILVUS_INDEX_TYPE=IVF_FLAT
706
+ MILVUS_NPROBE=8 # Lower = faster
707
+
708
+ # Enable cache
709
+ REDIS_URL=redis://localhost:6379
710
+
711
+ # Fewer LLM tokens
712
+ GROQ_MAX_TOKENS=512
713
+ ```
714
+
715
+ ### For Quality
716
+
717
+ ```env
718
+ # Larger embedding model
719
+ EMBED_MODEL_NAME=BAAI/bge-base-en-v1.5
720
+
721
+ # Optimal chunks
722
+ CHUNK_SIZE_TOKENS=512
723
+ PARENT_CHUNK_SIZE_TOKENS=2048
724
+
725
+ # More precise index
726
+ MILVUS_INDEX_TYPE=HNSW
727
+ MILVUS_NPROBE=32
728
+
729
+ # Better LLM
730
+ GROQ_MODEL=llama-3.3-70b-versatile
731
+
732
+ # Enable CRAG
733
+ CRAG_ENABLED=true
734
+ ```
735
+
736
+ ---
737
+
738
+ ## πŸ› Troubleshooting
739
+
740
+ ### Milvus Connection Failed
741
+
742
+ ```bash
743
+ # Check if Milvus is running
744
+ curl http://localhost:19530/healthz
745
+
746
+ # Restart Milvus
747
+ docker-compose restart milvus
748
+
749
+ # Verify in settings
750
+ python -c "from config import get_settings; print(get_settings().milvus_host)"
751
+ ```
752
+
753
+ ### Low Retrieval Quality
754
+
755
+ 1. **Check chunk quality:**
756
+ ```python
757
+ from ingestion.chunker import SemanticChunker
758
+ chunker = SemanticChunker()
759
+ chunks = chunker.chunk("your document text")
760
+ print([c.text for c in chunks[:3]])
761
+ ```
762
+
763
+ 2. **Verify embeddings:**
764
+ ```python
765
+ from retrieval.embedder import Embedder
766
+ embedder = Embedder()
767
+ emb = embedder.embed("test query")
768
+ print(f"Embedding dim: {len(emb)}, sample: {emb[:5]}")
769
+ ```
770
+
771
+ 3. **Enable CRAG** for automatic augmentation:
772
+ ```env
773
+ CRAG_ENABLED=true
774
+ ```
775
+
776
+ ### Slow Response Times
777
+
778
+ 1. Check cache hit rate
779
+ 2. Reduce `MILVUS_NPROBE`
780
+ 3. Use streaming endpoint (`/query/stream`)
781
+ 4. Enable Redis caching
782
+
783
+ ### Out of Memory
784
+
785
+ ```env
786
+ # Reduce batch sizes
787
+ EMBED_BATCH_SIZE=16
788
+
789
+ # Reduce chunk sizes
790
+ CHUNK_SIZE_TOKENS=128
791
+
792
+ # Switch to CPU if using GPU
793
+ EMBED_DEVICE=cpu
794
+ ```
795
+
796
+ ---
797
+
798
+ ## πŸ“ˆ Monitoring & Evaluation
799
+
800
+ ### Health Check
801
+
802
+ ```bash
803
+ curl http://localhost:8000/health | jq .
804
+ ```
805
+
806
+ ### Collection Statistics
807
+
808
+ ```python
809
+ from retrieval.dense import MilvusStore
810
+ from retrieval.embedder import Embedder
811
+
812
+ store = MilvusStore(embedder=Embedder())
813
+ stats = store.get_stats()
814
+ print(f"Documents: {stats['collection_count']}")
815
+ ```
816
+
817
+ ### Query Evaluation
818
+
819
+ ```python
820
+ from evaluation.ragas_eval import RAGASEvaluator
821
+ from evaluation.store import EvalStore
822
+
823
+ evaluator = RAGASEvaluator(store=EvalStore(db_path="/data/storage/eval.db"))
824
+ metrics = evaluator.evaluate(query, context, answer)
825
+ print(f"Answer Relevance: {metrics['answer_relevance']:.2f}")
826
+ print(f"Faithfulness: {metrics['faithfulness']:.2f}")
827
+ print(f"Context Precision: {metrics['context_precision']:.2f}")
828
+ ```
829
+
830
+ ---
831
+
832
+ ## 🀝 Contributing
833
+
834
+ Contributions welcome! Areas for enhancement:
835
+
836
+ - [ ] Multi-language support
837
+ - [ ] Fine-tuned domain-specific embeddings
838
+ - [ ] Advanced reranking strategies
839
+ - [ ] GraphQL API
840
+ - [ ] Persistent trace logging
841
+ - [ ] A/B testing framework
842
+
843
+ ---
844
+
845
+ ## πŸ“ License
846
+
847
+ MIT License β€” see LICENSE file for details
848
+
849
+ ---
850
+
851
+ ## πŸ”— Resources
852
+
853
+ - [Milvus Documentation](https://milvus.io/docs)
854
+ - [FastAPI Guide](https://fastapi.tiangolo.com/)
855
+ - [RAGAS Evaluation Framework](https://github.com/explorerx3/ragas)
856
+ - [Groq API Reference](https://console.groq.com/docs/api-reference)
857
+ - [CRAG Paper](https://arxiv.org/abs/2401.15884)
858
+
859
+ ---
860
+
861
+ **Questions?** Open an issue on GitHub or check the documentation.
862
  source .venv/bin/activate
863
  pip install -r requirements.txt
864
  python -m nltk.downloader punkt