Spaces:

fikri0o0
/

philosopher-chat

Sleeping

App Files Files Community

philosopher-chat / README.md

fikri0o0

Honest eval narrative + query rewrite off by default

2224b90 verified 8 days ago

preview code

raw

history blame contribute delete

6.86 kB

	---
	title: Philosopher Chat
	emoji: 🏛️
	colorFrom: purple
	colorTo: indigo
	sdk: gradio
	sdk_version: 6.15.1
	app_file: app.py
	pinned: false
	license: mit
	---

	# Philosopher Chat

	A RAG (Retrieval-Augmented Generation) chatbot grounded in Western philosophical primary texts.
	Ask questions about nihilism, existentialism, epistemology, ethics, and more — answers are
	cited directly from 12 primary texts (~5,700 chunks).

	Live demo: [fikri0o0/philosopher-chat on HuggingFace Spaces](https://huggingface.co/spaces/fikri0o0/philosopher-chat)

	---

	## Features

	\| Feature \| Detail \|
	\|---\|---\|
	\| Query rewriting \| Multi-query expansion (LLM paraphrases) fused with RRF for better recall \|
	\| Two-stage retrieval \| Hybrid (dense + BM25) → RRF → cross-encoder reranking \|
	\| Corrective RAG \| Abstains when retrieval confidence is low instead of hallucinating \|
	\| RAGAS evaluation \| 4 metrics, 3-stage ablation — each component quantified, not assumed \|
	\| Streaming \| Token-by-token via Google / Groq / OpenRouter \|
	\| 15 LLMs \| Gemma 4, Gemini, Llama 4, Qwen3, DeepSeek, Nemotron — all free tier \|
	\| Think blocks \| Qwen3 / DeepSeek reasoning rendered as collapsible chains-of-thought \|
	\| UMAP viz \| 2D projection of all 5,700+ embeddings coloured by philosopher \|
	\| Model comparison \| Side-by-side latency + quality comparison across any two models \|
	\| Extendable KB \| Upload your own PDF/TXT to add new philosophers \|

	---

	## Knowledge Base

	\| Philosopher \| Works \|
	\|---\|---\|
	\| Nietzsche \| Thus Spoke Zarathustra, Beyond Good and Evil, On the Genealogy of Morality, The Birth of Tragedy \|
	\| Schopenhauer \| Essays of Arthur Schopenhauer \|
	\| Hume \| An Enquiry Concerning Human Understanding \|
	\| Russell \| The Problems of Philosophy \|
	\| Marcus Aurelius \| Meditations \|
	\| Plato \| The Republic \|
	\| Mill \| Utilitarianism \|
	\| Epictetus \| The Enchiridion \|
	\| Kant \| Fundamental Principles of the Metaphysic of Morals \|

	All texts are public domain, sourced from [Project Gutenberg](https://www.gutenberg.org).

	---

	## Tech Stack

	\| Layer \| Tool \|
	\|---\|---\|
	\| LLM routing \| 15 models via Google AI Studio, Groq, OpenRouter (all free tier) \|
	\| Embeddings \| `google/embeddinggemma-300m` (HuggingFace, 768-dim) \|
	\| Query transform \| Multi-query rewriting (LLM paraphrases → RRF) \|
	\| Retrieval \| Hybrid (dense + BM25) → RRF fusion → cross-encoder rerank \|
	\| Reranker \| `BAAI/bge-reranker-v2-m3` (multilingual cross-encoder) \|
	\| Guardrail \| Corrective RAG — cosine-gated abstention on out-of-corpus queries \|
	\| Evaluation \| RAGAS metrics (faithfulness, relevancy, context precision/recall) \|
	\| RAG Framework \| LangChain LCEL (no chains, direct composition) \|
	\| UI \| Gradio 6 \|
	\| Deployment \| HuggingFace Spaces \|

	---

	## Retrieval Architecture

	```
	Question
	│
	├─ Query rewriting (LLM → original + paraphrases) ─┐
	│ each variant ↓ │
	├─ Dense retrieval (EmbeddingGemma-300M → ChromaDB cosine) ├─ RRF fusion → top-20 pool
	├─ Sparse retrieval (BM25 / rank-bm25) ─┘
	│
	├─ Cross-encoder rerank (BGE-reranker-v2-m3) → top-6
	│
	├─ Corrective gate (cosine < threshold → abstain)
	│
	└─ LLM answer (grounded + cited from top-6 chunks)
	```

	The pattern follows modern production RAG: cheap recall first (multi-query hybrid),
	a precise cross-encoder rerank of the small pool, and an abstention gate so
	out-of-corpus questions get an honest "I don't know" instead of a hallucination.

	## Evaluation

	The pipeline is measured, not assumed. [`evaluate.py`](evaluate.py) generates
	answers for a curated question set with reference answers across a 3-stage ablation
	(baseline → + reranker → + query rewrite), then an LLM judge scores four
	[RAGAS](https://docs.ragas.io) metrics. Results render live in the app's
	📊 Evaluation tab; full analysis in the
	[evaluation notebook](notebooks/rag_evaluation.ipynb).

	### Measuring each component (12 questions, LLM-as-judge)

	\| Metric \| Baseline (Hybrid) \| + Reranker \| + Query Rewrite \| Δ (full) \|
	\|---\|:---:\|:---:\|:---:\|:---:\|
	\| Faithfulness \| 0.40 \| 0.44 \| 0.43 \| +0.03 \|
	\| Answer Relevancy \| 0.94 \| 0.90 \| 0.91 \| −0.03 \|
	\| Context Precision \| 1.00 \| 1.00 \| 0.99 \| −0.01 \|
	\| Context Recall \| 0.38 \| 0.51 \| 0.43 \| +0.06 \|

	The reranker is the clear win — Context Recall jumps +0.14 (0.38 → 0.51) and
	Faithfulness rises, with no real cost elsewhere. *Query rewriting did not* help
	this corpus** — it slightly reduced recall (0.51 → 0.43) and relevancy. That is the
	point of measuring: the data says ship the reranker and treat query rewriting as
	corpus-dependent rather than assuming more components are always better. Two-phase
	eval (generation, then judging) keeps it reproducible:

	```bash
	pip install -r requirements.txt
	pip install --no-deps ragas && pip install -r requirements-eval.txt
	python evaluate.py --generate # phase A: real retrieval + generation → eval_samples.json
	# phase B: an LLM judge scores eval_samples.json → eval_results.json
	```

	---

	## Local Setup

	### 1. Clone and install

	```bash
	git clone https://github.com/Fikri645/philosopher-chat
	cd philosopher-chat
	pip install -r requirements.txt
	```

	### 2. Set up API keys

	```bash
	# Create .env with your keys:
	GOOGLE_API_KEY=... # https://ai.google.dev (free)
	GROQ_API_KEY=... # https://console.groq.com (free)
	OPENROUTER_API_KEY=... # https://openrouter.ai (free)
	HF_TOKEN=... # https://huggingface.co/settings/tokens (for gated EmbeddingGemma)
	```

	### 3. Build the vectorstore (run once)

	```bash
	python ingest.py
	```

	Downloads 12 texts from Project Gutenberg, chunks them, embeds with EmbeddingGemma-300M,
	and persists to `vectorstore/`. Takes ~5–10 min on first run (model download + embedding).

	### 4. Run the app

	```bash
	python app.py
	```

	Open http://localhost:7860 in your browser.

	---

	## Deploying to HuggingFace Spaces

	1. Fork or push to a new Space (SDK: Gradio)
	2. In Space Settings → Variables and Secrets, add:
	- `GOOGLE_API_KEY`
	- `GROQ_API_KEY`
	- `OPENROUTER_API_KEY`
	- `HF_TOKEN` (your HF token — needed to download the gated EmbeddingGemma model)
	3. On first boot the Space auto-ingests all 12 texts (~10 min); subsequent boots load the cached vectorstore.

	---

	## Project Structure

	```
	philosopher-chat/
	├── app.py ← Gradio UI + event handlers
	├── rag_chain.py ← LangChain RAG pipeline (retrieval + LLM routing)
	├── ingest.py ← Data ingestion from Project Gutenberg
	├── config.py ← LLM options, embedding model, RAG parameters
	├── requirements.txt
	├── .gitignore
	└── README.md
	```