LifeGuide / README.md
Shouvik599
Updated README
26a5301
metadata
title: Sacred Texts RAG
emoji: πŸ•ŠοΈ
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false

πŸ•ŠοΈ Sacred Texts RAG β€” Multi-Religion Knowledge Base

A Retrieval-Augmented Generation (RAG) application that answers spiritual queries using the Bhagavad Gita, Quran, Bible, and Guru Granth Sahib as the sole knowledge sources. Now with multi-turn conversation memory β€” ask follow-up questions naturally, just like a real dialogue.


πŸ“ Project Structure

sacred-texts-rag/
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ ingest.py               # Step 1: Load PDFs β†’ chunk β†’ embed β†’ store
β”œβ”€β”€ rag_chain.py            # Core RAG chain logic (with session memory)
β”œβ”€β”€ app.py                  # FastAPI backend server
└── frontend/
    └── index.html          # Chat UI (served by FastAPI)

βš™οΈ Setup Instructions

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env and add your NVIDIA_API_KEY

3. Add Your PDF Books

Place your PDF files in a books/ folder:

books/
β”œβ”€β”€ bhagavad_gita.pdf
β”œβ”€β”€ quran.pdf
β”œβ”€β”€ bible.pdf
└── guru_granth_sahib.pdf

4. Ingest the Books (Run Once)

python ingest.py

This will:

  • Load and parse all PDFs
  • Split into semantic chunks
  • Create embeddings using NVIDIA's llama-nemotron-embed-vl-1b-v2 model
  • Store in a local ChromaDB vector store (./chroma_db/)

5. Start the Backend

python app.py

Server runs at: http://localhost:7860

6. Open the Frontend

Navigate to http://localhost:7860 in your browser β€” the FastAPI server serves the UI directly.


πŸ”‘ Environment Variables

Variable Description Default
NVIDIA_API_KEY Your NVIDIA API key β€”
CHROMA_DB_PATH Path to ChromaDB storage ./chroma_db
COLLECTION_NAME ChromaDB collection name sacred_texts
CHUNKS_PER_BOOK Chunks retrieved per book per query 3
MAX_HISTORY_TURNS Max conversation turns kept in memory per session 6
HOST Server bind host 0.0.0.0
PORT Server port 7860

🧠 How It Works

User Query
    β”‚
    β–Ό
[Session Memory]  ←── Injects prior conversation turns into LLM context
    β”‚
    β–Ό
[Query Augmentation]  ←── Short follow-ups are enriched with previous question
    β”‚
    β–Ό
[Hybrid Retrieval: BM25 + Vector Search]  ←── Per-book guaranteed slots
    β”‚
    β–Ό
[NVIDIA Reranker]  ←── llama-3.2-nv-rerankqa-1b-v2 re-scores pooled candidates
    β”‚
    β–Ό
[Semantic Cache Check]  ←── Skip LLM if a similar question was answered before
    β”‚
    β–Ό
[Prompt with Context + History]
    β”‚
    β–Ό
[Llama-3.3-70b-instruct]  ←── Answer grounded ONLY in retrieved texts
    β”‚
    β–Ό
Streamed response with source citations (book + chapter/verse)

πŸ’¬ Multi-Turn Conversation

The app maintains per-session conversation history so you can ask natural follow-up questions:

You:  "What do the scriptures say about forgiveness?"
AI:   [Answer citing Gita, Quran, Bible, Guru Granth Sahib]

You:  "Elaborate on the second point"       ← follow-up, no context needed
AI:   [Continues from previous answer]

You:  "What does the Bible say specifically?"  ← drill-down
AI:   [Focuses on Bible passages from the thread]

How sessions work:

  • A session ID is created automatically on your first question and persisted in the browser's localStorage
  • The server keeps the last MAX_HISTORY_TURNS (default: 6) human+AI pairs in memory
  • Click β†Ί New Conversation in the header to clear history and start fresh
  • Sessions are scoped to the server process β€” they reset on server restart

🌐 API Endpoints

Method Endpoint Description
POST /ask Ask a question; streams NDJSON response
POST /clear Clear conversation history for a session
GET /history Inspect conversation history for a session
GET /books List all books indexed in the knowledge base
GET /health Health check
GET / Serves the frontend UI
GET /docs Swagger UI

/ask Request Body

{
  "question": "What do the scriptures say about compassion?",
  "session_id": "optional-uuid-string"
}

/ask Response (streamed NDJSON)

{"type": "token",   "data": "The Bhagavad Gita teaches..."}
{"type": "token",   "data": " compassion as..."}
{"type": "sources", "data": [{"book": "Bhagavad Gita 2:47", "page": "2:47", "snippet": "..."}]}

Cache hits return a single {"type": "cache", "data": {"answer": "...", "sources": [...]}} line.


πŸ“ Notes

  • The LLM is instructed never to answer from outside the provided texts
  • Each response includes source citations (book + chapter/verse where available)
  • Responses synthesize wisdom across all books when relevant
  • The semantic cache skips the LLM for repeated or near-identical questions (cosine distance < 0.35)
  • Follow-up retrieval automatically augments vague short queries with the previous question for better semantic matching

πŸ—ΊοΈ Planned Features

  • Contextual chunk expansion (fetch Β±1 surrounding chunks)
  • HyDE β€” Hypothetical Document Embedding for abstract queries
  • Answer faithfulness scoring (LLM-as-judge)
  • Query rewriting for vague inputs
  • Snippet preview on source hover
  • Query suggestions after each answer
  • Compare mode β€” side-by-side view across books
  • Hallucination guardrail
  • Out-of-scope detection
  • Rate limiting & API key hardening

🎬 Demo

App Link: https://shouvik99-lifeguide.hf.space/