---
title: Philosopher Chat
emoji: 🏛️
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 6.15.1
app_file: app.py
pinned: false
license: mit
---

# Philosopher Chat

A RAG (Retrieval-Augmented Generation) chatbot grounded in Western philosophical primary texts.
Ask questions about nihilism, existentialism, epistemology, ethics, and more — answers are
cited directly from 12 primary texts (~5,700 chunks).

**Live demo:** [fikri0o0/philosopher-chat on HuggingFace Spaces](https://huggingface.co/spaces/fikri0o0/philosopher-chat)

---

## Features

| Feature | Detail |
|---|---|
| **Query rewriting** | Multi-query expansion (LLM paraphrases) fused with RRF for better recall |
| **Two-stage retrieval** | Hybrid (dense + BM25) → RRF → cross-encoder reranking |
| **Corrective RAG** | Abstains when retrieval confidence is low instead of hallucinating |
| **RAGAS evaluation** | 4 metrics, 3-stage ablation — each component quantified, not assumed |
| **Streaming** | Token-by-token via Google / Groq / OpenRouter |
| **15 LLMs** | Gemma 4, Gemini, Llama 4, Qwen3, DeepSeek, Nemotron — all free tier |
| **Think blocks** | Qwen3 / DeepSeek reasoning rendered as collapsible chains-of-thought |
| **UMAP viz** | 2D projection of all 5,700+ embeddings coloured by philosopher |
| **Model comparison** | Side-by-side latency + quality comparison across any two models |
| **Extendable KB** | Upload your own PDF/TXT to add new philosophers |

---

## Knowledge Base

| Philosopher | Works |
|---|---|
| Nietzsche | *Thus Spoke Zarathustra*, *Beyond Good and Evil*, *On the Genealogy of Morality*, *The Birth of Tragedy* |
| Schopenhauer | *Essays of Arthur Schopenhauer* |
| Hume | *An Enquiry Concerning Human Understanding* |
| Russell | *The Problems of Philosophy* |
| Marcus Aurelius | *Meditations* |
| Plato | *The Republic* |
| Mill | *Utilitarianism* |
| Epictetus | *The Enchiridion* |
| Kant | *Fundamental Principles of the Metaphysic of Morals* |

All texts are public domain, sourced from [Project Gutenberg](https://www.gutenberg.org).

---

## Tech Stack

| Layer | Tool |
|---|---|
| LLM routing | 15 models via Google AI Studio, Groq, OpenRouter (all free tier) |
| Embeddings | `google/embeddinggemma-300m` (HuggingFace, 768-dim) |
| Query transform | Multi-query rewriting (LLM paraphrases → RRF) |
| Retrieval | Hybrid (dense + BM25) → RRF fusion → cross-encoder rerank |
| Reranker | `BAAI/bge-reranker-v2-m3` (multilingual cross-encoder) |
| Guardrail | Corrective RAG — cosine-gated abstention on out-of-corpus queries |
| Evaluation | RAGAS metrics (faithfulness, relevancy, context precision/recall) |
| RAG Framework | LangChain LCEL (no chains, direct composition) |
| UI | Gradio 6 |
| Deployment | HuggingFace Spaces |

---

## Retrieval Architecture

```
Question
   │
   ├─ Query rewriting   (LLM → original + paraphrases)            ─┐
   │     each variant ↓                                            │
   ├─ Dense retrieval   (EmbeddingGemma-300M → ChromaDB cosine)    ├─ RRF fusion → top-20 pool
   ├─ Sparse retrieval  (BM25 / rank-bm25)                        ─┘
   │
   ├─ Cross-encoder rerank  (BGE-reranker-v2-m3) → top-6
   │
   ├─ Corrective gate  (cosine < threshold → abstain)
   │
   └─ LLM answer  (grounded + cited from top-6 chunks)
```

The pattern follows modern production RAG: cheap recall first (multi-query hybrid),
a precise cross-encoder rerank of the small pool, and an abstention gate so
out-of-corpus questions get an honest "I don't know" instead of a hallucination.

## Evaluation

The pipeline is **measured, not assumed**. [`evaluate.py`](evaluate.py) generates
answers for a curated question set with reference answers across a **3-stage ablation**
(baseline → + reranker → + query rewrite), then an LLM judge scores four
[RAGAS](https://docs.ragas.io) metrics. Results render live in the app's
**📊 Evaluation** tab; full analysis in the
[evaluation notebook](notebooks/rag_evaluation.ipynb).

### Measuring each component (12 questions, LLM-as-judge)

| Metric | Baseline (Hybrid) | + Reranker | + Query Rewrite | Δ (full) |
|---|:---:|:---:|:---:|:---:|
| **Faithfulness** | 0.40 | 0.44 | 0.43 | +0.03 |
| **Answer Relevancy** | 0.94 | 0.90 | 0.91 | −0.03 |
| **Context Precision** | 1.00 | 1.00 | 0.99 | −0.01 |
| **Context Recall** | 0.38 | 0.51 | 0.43 | +0.06 |

**The reranker is the clear win** — Context Recall jumps +0.14 (0.38 → 0.51) and
Faithfulness rises, with no real cost elsewhere. **Query rewriting did *not* help
this corpus** — it slightly *reduced* recall (0.51 → 0.43) and relevancy. That is the
point of measuring: the data says ship the reranker and treat query rewriting as
corpus-dependent rather than assuming more components are always better. Two-phase
eval (generation, then judging) keeps it reproducible:

```bash
pip install -r requirements.txt
pip install --no-deps ragas && pip install -r requirements-eval.txt
python evaluate.py --generate   # phase A: real retrieval + generation → eval_samples.json
# phase B: an LLM judge scores eval_samples.json → eval_results.json
```

---

## Local Setup

### 1. Clone and install

```bash
git clone https://github.com/Fikri645/philosopher-chat
cd philosopher-chat
pip install -r requirements.txt
```

### 2. Set up API keys

```bash
# Create .env with your keys:
GOOGLE_API_KEY=...       # https://ai.google.dev  (free)
GROQ_API_KEY=...         # https://console.groq.com  (free)
OPENROUTER_API_KEY=...   # https://openrouter.ai  (free)
HF_TOKEN=...             # https://huggingface.co/settings/tokens  (for gated EmbeddingGemma)
```

### 3. Build the vectorstore (run once)

```bash
python ingest.py
```

Downloads 12 texts from Project Gutenberg, chunks them, embeds with EmbeddingGemma-300M,
and persists to `vectorstore/`. Takes ~5–10 min on first run (model download + embedding).

### 4. Run the app

```bash
python app.py
```

Open http://localhost:7860 in your browser.

---

## Deploying to HuggingFace Spaces

1. Fork or push to a new Space (SDK: **Gradio**)
2. In **Space Settings → Variables and Secrets**, add:
   - `GOOGLE_API_KEY`
   - `GROQ_API_KEY`
   - `OPENROUTER_API_KEY`
   - `HF_TOKEN` (your HF token — needed to download the gated EmbeddingGemma model)
3. On first boot the Space auto-ingests all 12 texts (~10 min); subsequent boots load the cached vectorstore.

---

## Project Structure

```
philosopher-chat/
├── app.py              ← Gradio UI + event handlers
├── rag_chain.py        ← LangChain RAG pipeline (retrieval + LLM routing)
├── ingest.py           ← Data ingestion from Project Gutenberg
├── config.py           ← LLM options, embedding model, RAG parameters
├── requirements.txt
├── .gitignore
└── README.md
```