LifeGuide / README.md
Shouvik599
Updated README
26a5301
---
title: Sacred Texts RAG
emoji: πŸ•ŠοΈ
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
---
# πŸ•ŠοΈ Sacred Texts RAG β€” Multi-Religion Knowledge Base
A Retrieval-Augmented Generation (RAG) application that answers spiritual queries using the Bhagavad Gita, Quran, Bible, and Guru Granth Sahib as the sole knowledge sources. Now with **multi-turn conversation memory** β€” ask follow-up questions naturally, just like a real dialogue.
---
## πŸ“ Project Structure
```
sacred-texts-rag/
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ ingest.py # Step 1: Load PDFs β†’ chunk β†’ embed β†’ store
β”œβ”€β”€ rag_chain.py # Core RAG chain logic (with session memory)
β”œβ”€β”€ app.py # FastAPI backend server
└── frontend/
└── index.html # Chat UI (served by FastAPI)
```
---
## βš™οΈ Setup Instructions
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure Environment
```bash
cp .env.example .env
# Edit .env and add your NVIDIA_API_KEY
```
### 3. Add Your PDF Books
Place your PDF files in a `books/` folder:
```
books/
β”œβ”€β”€ bhagavad_gita.pdf
β”œβ”€β”€ quran.pdf
β”œβ”€β”€ bible.pdf
└── guru_granth_sahib.pdf
```
### 4. Ingest the Books (Run Once)
```bash
python ingest.py
```
This will:
- Load and parse all PDFs
- Split into semantic chunks
- Create embeddings using NVIDIA's `llama-nemotron-embed-vl-1b-v2` model
- Store in a local ChromaDB vector store (`./chroma_db/`)
### 5. Start the Backend
```bash
python app.py
```
Server runs at: `http://localhost:7860`
### 6. Open the Frontend
Navigate to `http://localhost:7860` in your browser β€” the FastAPI server serves the UI directly.
---
## πŸ”‘ Environment Variables
| Variable | Description | Default |
|---|---|---|
| `NVIDIA_API_KEY` | Your NVIDIA API key | β€” |
| `CHROMA_DB_PATH` | Path to ChromaDB storage | `./chroma_db` |
| `COLLECTION_NAME` | ChromaDB collection name | `sacred_texts` |
| `CHUNKS_PER_BOOK` | Chunks retrieved per book per query | `3` |
| `MAX_HISTORY_TURNS` | Max conversation turns kept in memory per session | `6` |
| `HOST` | Server bind host | `0.0.0.0` |
| `PORT` | Server port | `7860` |
---
## 🧠 How It Works
```
User Query
β”‚
β–Ό
[Session Memory] ←── Injects prior conversation turns into LLM context
β”‚
β–Ό
[Query Augmentation] ←── Short follow-ups are enriched with previous question
β”‚
β–Ό
[Hybrid Retrieval: BM25 + Vector Search] ←── Per-book guaranteed slots
β”‚
β–Ό
[NVIDIA Reranker] ←── llama-3.2-nv-rerankqa-1b-v2 re-scores pooled candidates
β”‚
β–Ό
[Semantic Cache Check] ←── Skip LLM if a similar question was answered before
β”‚
β–Ό
[Prompt with Context + History]
β”‚
β–Ό
[Llama-3.3-70b-instruct] ←── Answer grounded ONLY in retrieved texts
β”‚
β–Ό
Streamed response with source citations (book + chapter/verse)
```
---
## πŸ’¬ Multi-Turn Conversation
The app maintains per-session conversation history so you can ask natural follow-up questions:
```
You: "What do the scriptures say about forgiveness?"
AI: [Answer citing Gita, Quran, Bible, Guru Granth Sahib]
You: "Elaborate on the second point" ← follow-up, no context needed
AI: [Continues from previous answer]
You: "What does the Bible say specifically?" ← drill-down
AI: [Focuses on Bible passages from the thread]
```
**How sessions work:**
- A session ID is created automatically on your first question and persisted in the browser's `localStorage`
- The server keeps the last `MAX_HISTORY_TURNS` (default: 6) human+AI pairs in memory
- Click **β†Ί New Conversation** in the header to clear history and start fresh
- Sessions are scoped to the server process β€” they reset on server restart
---
## 🌐 API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/ask` | Ask a question; streams NDJSON response |
| `POST` | `/clear` | Clear conversation history for a session |
| `GET` | `/history` | Inspect conversation history for a session |
| `GET` | `/books` | List all books indexed in the knowledge base |
| `GET` | `/health` | Health check |
| `GET` | `/` | Serves the frontend UI |
| `GET` | `/docs` | Swagger UI |
### `/ask` Request Body
```json
{
"question": "What do the scriptures say about compassion?",
"session_id": "optional-uuid-string"
}
```
### `/ask` Response (streamed NDJSON)
```json
{"type": "token", "data": "The Bhagavad Gita teaches..."}
{"type": "token", "data": " compassion as..."}
{"type": "sources", "data": [{"book": "Bhagavad Gita 2:47", "page": "2:47", "snippet": "..."}]}
```
Cache hits return a single `{"type": "cache", "data": {"answer": "...", "sources": [...]}}` line.
---
## πŸ“ Notes
- The LLM is instructed **never** to answer from outside the provided texts
- Each response includes **source citations** (book + chapter/verse where available)
- Responses synthesize wisdom **across all books** when relevant
- The semantic cache skips the LLM for repeated or near-identical questions (cosine distance < 0.35)
- Follow-up retrieval automatically augments vague short queries with the previous question for better semantic matching
---
## πŸ—ΊοΈ Planned Features
- Contextual chunk expansion (fetch Β±1 surrounding chunks)
- HyDE β€” Hypothetical Document Embedding for abstract queries
- Answer faithfulness scoring (LLM-as-judge)
- Query rewriting for vague inputs
- Snippet preview on source hover
- Query suggestions after each answer
- Compare mode β€” side-by-side view across books
- Hallucination guardrail
- Out-of-scope detection
- Rate limiting & API key hardening
---
## 🎬 Demo
App Link: https://shouvik99-lifeguide.hf.space/