metadata
title: Sacred Texts RAG
emoji: ποΈ
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
ποΈ Sacred Texts RAG β Multi-Religion Knowledge Base
A Retrieval-Augmented Generation (RAG) application that answers spiritual queries using the Bhagavad Gita, Quran, Bible, and Guru Granth Sahib as the sole knowledge sources. Now with multi-turn conversation memory β ask follow-up questions naturally, just like a real dialogue.
π Project Structure
sacred-texts-rag/
βββ README.md
βββ requirements.txt
βββ .env.example
βββ ingest.py # Step 1: Load PDFs β chunk β embed β store
βββ rag_chain.py # Core RAG chain logic (with session memory)
βββ app.py # FastAPI backend server
βββ frontend/
βββ index.html # Chat UI (served by FastAPI)
βοΈ Setup Instructions
1. Install Dependencies
pip install -r requirements.txt
2. Configure Environment
cp .env.example .env
# Edit .env and add your NVIDIA_API_KEY
3. Add Your PDF Books
Place your PDF files in a books/ folder:
books/
βββ bhagavad_gita.pdf
βββ quran.pdf
βββ bible.pdf
βββ guru_granth_sahib.pdf
4. Ingest the Books (Run Once)
python ingest.py
This will:
- Load and parse all PDFs
- Split into semantic chunks
- Create embeddings using NVIDIA's
llama-nemotron-embed-vl-1b-v2model - Store in a local ChromaDB vector store (
./chroma_db/)
5. Start the Backend
python app.py
Server runs at: http://localhost:7860
6. Open the Frontend
Navigate to http://localhost:7860 in your browser β the FastAPI server serves the UI directly.
π Environment Variables
| Variable | Description | Default |
|---|---|---|
NVIDIA_API_KEY |
Your NVIDIA API key | β |
CHROMA_DB_PATH |
Path to ChromaDB storage | ./chroma_db |
COLLECTION_NAME |
ChromaDB collection name | sacred_texts |
CHUNKS_PER_BOOK |
Chunks retrieved per book per query | 3 |
MAX_HISTORY_TURNS |
Max conversation turns kept in memory per session | 6 |
HOST |
Server bind host | 0.0.0.0 |
PORT |
Server port | 7860 |
π§ How It Works
User Query
β
βΌ
[Session Memory] βββ Injects prior conversation turns into LLM context
β
βΌ
[Query Augmentation] βββ Short follow-ups are enriched with previous question
β
βΌ
[Hybrid Retrieval: BM25 + Vector Search] βββ Per-book guaranteed slots
β
βΌ
[NVIDIA Reranker] βββ llama-3.2-nv-rerankqa-1b-v2 re-scores pooled candidates
β
βΌ
[Semantic Cache Check] βββ Skip LLM if a similar question was answered before
β
βΌ
[Prompt with Context + History]
β
βΌ
[Llama-3.3-70b-instruct] βββ Answer grounded ONLY in retrieved texts
β
βΌ
Streamed response with source citations (book + chapter/verse)
π¬ Multi-Turn Conversation
The app maintains per-session conversation history so you can ask natural follow-up questions:
You: "What do the scriptures say about forgiveness?"
AI: [Answer citing Gita, Quran, Bible, Guru Granth Sahib]
You: "Elaborate on the second point" β follow-up, no context needed
AI: [Continues from previous answer]
You: "What does the Bible say specifically?" β drill-down
AI: [Focuses on Bible passages from the thread]
How sessions work:
- A session ID is created automatically on your first question and persisted in the browser's
localStorage - The server keeps the last
MAX_HISTORY_TURNS(default: 6) human+AI pairs in memory - Click βΊ New Conversation in the header to clear history and start fresh
- Sessions are scoped to the server process β they reset on server restart
π API Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/ask |
Ask a question; streams NDJSON response |
POST |
/clear |
Clear conversation history for a session |
GET |
/history |
Inspect conversation history for a session |
GET |
/books |
List all books indexed in the knowledge base |
GET |
/health |
Health check |
GET |
/ |
Serves the frontend UI |
GET |
/docs |
Swagger UI |
/ask Request Body
{
"question": "What do the scriptures say about compassion?",
"session_id": "optional-uuid-string"
}
/ask Response (streamed NDJSON)
{"type": "token", "data": "The Bhagavad Gita teaches..."}
{"type": "token", "data": " compassion as..."}
{"type": "sources", "data": [{"book": "Bhagavad Gita 2:47", "page": "2:47", "snippet": "..."}]}
Cache hits return a single {"type": "cache", "data": {"answer": "...", "sources": [...]}} line.
π Notes
- The LLM is instructed never to answer from outside the provided texts
- Each response includes source citations (book + chapter/verse where available)
- Responses synthesize wisdom across all books when relevant
- The semantic cache skips the LLM for repeated or near-identical questions (cosine distance < 0.35)
- Follow-up retrieval automatically augments vague short queries with the previous question for better semantic matching
πΊοΈ Planned Features
- Contextual chunk expansion (fetch Β±1 surrounding chunks)
- HyDE β Hypothetical Document Embedding for abstract queries
- Answer faithfulness scoring (LLM-as-judge)
- Query rewriting for vague inputs
- Snippet preview on source hover
- Query suggestions after each answer
- Compare mode β side-by-side view across books
- Hallucination guardrail
- Out-of-scope detection
- Rate limiting & API key hardening
π¬ Demo
App Link: https://shouvik99-lifeguide.hf.space/