philosopher-chat / README.md
fikri0o0's picture
Honest eval narrative + query rewrite off by default
2224b90 verified
---
title: Philosopher Chat
emoji: πŸ›οΈ
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 6.15.1
app_file: app.py
pinned: false
license: mit
---
# Philosopher Chat
A RAG (Retrieval-Augmented Generation) chatbot grounded in Western philosophical primary texts.
Ask questions about nihilism, existentialism, epistemology, ethics, and more β€” answers are
cited directly from 12 primary texts (~5,700 chunks).
**Live demo:** [fikri0o0/philosopher-chat on HuggingFace Spaces](https://huggingface.co/spaces/fikri0o0/philosopher-chat)
---
## Features
| Feature | Detail |
|---|---|
| **Query rewriting** | Multi-query expansion (LLM paraphrases) fused with RRF for better recall |
| **Two-stage retrieval** | Hybrid (dense + BM25) β†’ RRF β†’ cross-encoder reranking |
| **Corrective RAG** | Abstains when retrieval confidence is low instead of hallucinating |
| **RAGAS evaluation** | 4 metrics, 3-stage ablation β€” each component quantified, not assumed |
| **Streaming** | Token-by-token via Google / Groq / OpenRouter |
| **15 LLMs** | Gemma 4, Gemini, Llama 4, Qwen3, DeepSeek, Nemotron β€” all free tier |
| **Think blocks** | Qwen3 / DeepSeek reasoning rendered as collapsible chains-of-thought |
| **UMAP viz** | 2D projection of all 5,700+ embeddings coloured by philosopher |
| **Model comparison** | Side-by-side latency + quality comparison across any two models |
| **Extendable KB** | Upload your own PDF/TXT to add new philosophers |
---
## Knowledge Base
| Philosopher | Works |
|---|---|
| Nietzsche | *Thus Spoke Zarathustra*, *Beyond Good and Evil*, *On the Genealogy of Morality*, *The Birth of Tragedy* |
| Schopenhauer | *Essays of Arthur Schopenhauer* |
| Hume | *An Enquiry Concerning Human Understanding* |
| Russell | *The Problems of Philosophy* |
| Marcus Aurelius | *Meditations* |
| Plato | *The Republic* |
| Mill | *Utilitarianism* |
| Epictetus | *The Enchiridion* |
| Kant | *Fundamental Principles of the Metaphysic of Morals* |
All texts are public domain, sourced from [Project Gutenberg](https://www.gutenberg.org).
---
## Tech Stack
| Layer | Tool |
|---|---|
| LLM routing | 15 models via Google AI Studio, Groq, OpenRouter (all free tier) |
| Embeddings | `google/embeddinggemma-300m` (HuggingFace, 768-dim) |
| Query transform | Multi-query rewriting (LLM paraphrases β†’ RRF) |
| Retrieval | Hybrid (dense + BM25) β†’ RRF fusion β†’ cross-encoder rerank |
| Reranker | `BAAI/bge-reranker-v2-m3` (multilingual cross-encoder) |
| Guardrail | Corrective RAG β€” cosine-gated abstention on out-of-corpus queries |
| Evaluation | RAGAS metrics (faithfulness, relevancy, context precision/recall) |
| RAG Framework | LangChain LCEL (no chains, direct composition) |
| UI | Gradio 6 |
| Deployment | HuggingFace Spaces |
---
## Retrieval Architecture
```
Question
β”‚
β”œβ”€ Query rewriting (LLM β†’ original + paraphrases) ─┐
β”‚ each variant ↓ β”‚
β”œβ”€ Dense retrieval (EmbeddingGemma-300M β†’ ChromaDB cosine) β”œβ”€ RRF fusion β†’ top-20 pool
β”œβ”€ Sparse retrieval (BM25 / rank-bm25) β”€β”˜
β”‚
β”œβ”€ Cross-encoder rerank (BGE-reranker-v2-m3) β†’ top-6
β”‚
β”œβ”€ Corrective gate (cosine < threshold β†’ abstain)
β”‚
└─ LLM answer (grounded + cited from top-6 chunks)
```
The pattern follows modern production RAG: cheap recall first (multi-query hybrid),
a precise cross-encoder rerank of the small pool, and an abstention gate so
out-of-corpus questions get an honest "I don't know" instead of a hallucination.
## Evaluation
The pipeline is **measured, not assumed**. [`evaluate.py`](evaluate.py) generates
answers for a curated question set with reference answers across a **3-stage ablation**
(baseline β†’ + reranker β†’ + query rewrite), then an LLM judge scores four
[RAGAS](https://docs.ragas.io) metrics. Results render live in the app's
**πŸ“Š Evaluation** tab; full analysis in the
[evaluation notebook](notebooks/rag_evaluation.ipynb).
### Measuring each component (12 questions, LLM-as-judge)
| Metric | Baseline (Hybrid) | + Reranker | + Query Rewrite | Ξ” (full) |
|---|:---:|:---:|:---:|:---:|
| **Faithfulness** | 0.40 | 0.44 | 0.43 | +0.03 |
| **Answer Relevancy** | 0.94 | 0.90 | 0.91 | βˆ’0.03 |
| **Context Precision** | 1.00 | 1.00 | 0.99 | βˆ’0.01 |
| **Context Recall** | 0.38 | 0.51 | 0.43 | +0.06 |
**The reranker is the clear win** β€” Context Recall jumps +0.14 (0.38 β†’ 0.51) and
Faithfulness rises, with no real cost elsewhere. **Query rewriting did *not* help
this corpus** β€” it slightly *reduced* recall (0.51 β†’ 0.43) and relevancy. That is the
point of measuring: the data says ship the reranker and treat query rewriting as
corpus-dependent rather than assuming more components are always better. Two-phase
eval (generation, then judging) keeps it reproducible:
```bash
pip install -r requirements.txt
pip install --no-deps ragas && pip install -r requirements-eval.txt
python evaluate.py --generate # phase A: real retrieval + generation β†’ eval_samples.json
# phase B: an LLM judge scores eval_samples.json β†’ eval_results.json
```
---
## Local Setup
### 1. Clone and install
```bash
git clone https://github.com/Fikri645/philosopher-chat
cd philosopher-chat
pip install -r requirements.txt
```
### 2. Set up API keys
```bash
# Create .env with your keys:
GOOGLE_API_KEY=... # https://ai.google.dev (free)
GROQ_API_KEY=... # https://console.groq.com (free)
OPENROUTER_API_KEY=... # https://openrouter.ai (free)
HF_TOKEN=... # https://huggingface.co/settings/tokens (for gated EmbeddingGemma)
```
### 3. Build the vectorstore (run once)
```bash
python ingest.py
```
Downloads 12 texts from Project Gutenberg, chunks them, embeds with EmbeddingGemma-300M,
and persists to `vectorstore/`. Takes ~5–10 min on first run (model download + embedding).
### 4. Run the app
```bash
python app.py
```
Open http://localhost:7860 in your browser.
---
## Deploying to HuggingFace Spaces
1. Fork or push to a new Space (SDK: **Gradio**)
2. In **Space Settings β†’ Variables and Secrets**, add:
- `GOOGLE_API_KEY`
- `GROQ_API_KEY`
- `OPENROUTER_API_KEY`
- `HF_TOKEN` (your HF token β€” needed to download the gated EmbeddingGemma model)
3. On first boot the Space auto-ingests all 12 texts (~10 min); subsequent boots load the cached vectorstore.
---
## Project Structure
```
philosopher-chat/
β”œβ”€β”€ app.py ← Gradio UI + event handlers
β”œβ”€β”€ rag_chain.py ← LangChain RAG pipeline (retrieval + LLM routing)
β”œβ”€β”€ ingest.py ← Data ingestion from Project Gutenberg
β”œβ”€β”€ config.py ← LLM options, embedding model, RAG parameters
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .gitignore
└── README.md
```